HTML tutorial
CSS3 tutorial
Bootstrap tutorial
JavaScript tutorial
JQuery tutorial
AngularJS tutorial
React tutorial
NodeJS tutorial
PHP tutorial
Python tutorial
Python3 tutorial
Django tutorial
Linux tutorial
Docker tutorial
Ruby tutorial
Java tutorial
C tutorial
C ++ tutorial
Perl tutorial
JSP tutorial
Lua tutorial
Scala tutorial
Go tutorial
ASP.NET tutorial
C # tutorial
Correlation measures the numerical relationship between two variables
Correlation measures the numerical relationship between two variables.
A high correlation coefficient (close to 1), does not mean that we can for sure conclude an actual relationship between two variables.
A classic example:
Does this mean that increase of ice cream sale is a direct cause of increased drowning accidents?
Here, we constructed a fictional data set for you to try:
import pandas as pd
import matplotlib.pyplot as plt
Drowning_Accident = [20,40,60,80,100,120,140,160,180,200]
Ice_Cream_Sale =
[20,40,60,80,100,120,140,160,180,200]
Drowning = {"Drowning_Accident":
[20,40,60,80,100,120,140,160,180,200],
"Ice_Cream_Sale":
[20,40,60,80,100,120,140,160,180,200]}
Drowning = pd.DataFrame(data=Drowning)
Drowning.plot(x="Ice_Cream_Sale", y="Drowning_Accident", kind="scatter")
plt.show()
correlation_beach = Drowning.corr()
print(correlation_beach)
Output:
In other words: can we use ice cream sale to predict drowning accidents?
The answer is - Probably not.
It is likely that these two variables are accidentally correlating with each other.
What causes drowning then?
Let us reverse the argument:
Does a low correlation coefficient (close to zero) mean that change in x does not affect y?
Back to the question:
The answer is no.
There is an important difference between correlation and causality:
Tip: Always critically reflect over the concept of causality when doing predictions!