Let’s do the math! Not really. We’ll let excel do it for us.
Now that we have all the valus we need for the formula, let’s plug these in and try to figure out the result. On calculating, we get a correlation value of 0.930
. This is a very strong positive correlation. Let’s try plotting these out, for a more intutive understanding.
What do you see in this graph? There’s almost a positive linear relationship between our two variables. This supports our hypotheses that there’s a positive relationship between the height and weight of an individual.
But we’re not done yet. While there’s at least some relation between the height and the weight of an individual, there’s a pretty good chance that the correlation you have was a matter of chance. Are these values actually related, or is it spurious correlation
at play? This happens when two variables show common trends and have a strong correlation, but are not actually related. For example, there’s a pretty strong correlation (0.66) between the Number of people who drowned by falling into a pool and the number of films Nicolas cage has appeared in. Does that mean that one event causes the other? No (Maybe ;). It means that two unrelated variables(events) may have high correlation, but may not be dependent on each other, or there’s a third variable that we’re not taking into account. This may be seem obvious from the Nicolas cage example, but this is a very common mistake that data scientists do while engineering their features for their mathematical models.
So how do we figure out if a correlation is spurious or not?
”Suppose both lamps in a room go out suddenly. We regard it as improbable that by chance both bulbs burned out at the same time and look for a burned out fuse or some other interruption of the common power supply. The improbable coincidence is thus explained as the product of a common cause.”
Simply put, Reichenbach’s principle states that :
if two physical variables Y and Z are found to be statistically dependent, then there should be a causal explanation of this fact, either:
For example, if a correlation is found between Dog ownership and Video game sales, it implies that :
If we encouter situations like those present in #1
and #2
, we’re good. We’ve found what statisticians call a causal relationship
. No, not Casual. Causal. It’s the situations present in #3, #4, and #5 that are that are tough to crack. It’s just that we’re so desperate and want to believe that a relationship exists between two seemingly unrelated events, that we come up with a third event that we believe justifies the mathematically percieved correlation. Causal Relationships are cruicial in diverse fields like finance (Does an event in hong Kong impact the price of a share in New York?) or the Pharmaceutical industry (Does a particular drug have the necessary impact on the symptoms shown, or is there another factor that’s causing those symptomps?)
I just realised I’m overstaying my welcome with a post this long. Let’s do this. We’ll pause here for a while, I’ll let you think/research about some (if any) of the stuff you found interesting, and I’ll continue with a deep dive into causal networks in the next post. Till then, Cheers.