Correlation versus Causation

Correlation does not imply causation

Correlation

Correlation is a statistical indicator of the relationship between two variables without causal effects. It is expressed by correlation coefficients whose values range between -1.0 to 1.0. The coefficient is considered to have a 'perfect positive correlation' when its value is exactly 1 which means that the two variables move in the same direction perfectly.

Causation

Causation implies a cause-and-effect relationship between two variables. Simply, if one variable's value changes, it will cause a change in another variable's value.

Correlation does not Imply Causation

Why is this saying emphasized everywhere? Misunderstanding of the differences can result in the misinterpretation of the data. We need to understand that even though there exists a correlation between variables, it does not necessarily mean there exists a cause-and-effect relationship between them.

Let's take a look at an example! Figure 1 shows the strong correlation between variables which are ice cream sales and shark attacks. (Here we normalize the data for an easy interpretation of the data trend.) Now, the question is. 'Does this mean that ice cream sales cause shark attacks?' Well, the answer is "not necessarily".

alt-text-1

Figure 1: Ice Cream Sales Vs. Sharks Attack. (Source: Florida Museum)

What Could be the Explanation?

When it comes to a causal relationship, we need to be extremely careful. Especially, time-series data that the variables might just have the same trend over the same period. Referred to the example above, the data is recorded in the same period which does not mean one ice cream causes shark attacks or vice versa. Therefore, it is always worth investigating further can be the third variable such as temperature in this case.

Don't Jump to The Conclusion!

As mentioned, correlation does not imply causation, but correlation provides a hint if we should investigate the data further. Therefore, we must be careful about interpreting a casual relationship between variables when it only shows correlation.