If you took stats or even high-school math you may have heard your teacher emphasize the phrase “Correlation does not equal causation.” She was right, it doesn’t. That’s never clearer than when you take ridiculous data, throw it at the wall, and see what sticks.
Probably correlated with the number of tweets and articles about it, the website Spurious Correlations by Tyler Vigen may be those math teachers’ new go-to website to bolster their tried-and-true phrase. Vigen proves his point beautifully by taking literally thousands of data points and letting users find correlations between varying amounts over time.*
So, for example, you can see that the data supports the idea that the more movies Nic Cage appears in, the less people die in helicopter crashes. Or, the more the U.S. spends on science, the more citizens commit suicide.
But this is exactly why the Correlation Does Not Equal Causation mantra is so important. When the numbers line up, our brains try to find reasons behind the connection, even if there is none. That’s why correlation is related to or even implies causation, but never proves it.
We could speculate as to why we store more uranium for every math PhD awarded.
We could attribute the shocking increase in women who tripped in New York and died to lack of images copyrighted.
Maybe people get so depressed after buying a German car that they feel compelled to destroy themselves along with it.
Or perhaps people get so distracted by our dwindling honey production that they fall down stairs to their death.
But this is all speculation. Without a sound and tested theory to guide it, these are just meaningless relationships between numbers over time. A true causation needs at least three essential components:
1. A true correlation: If we want to say that X caused Y then X should actually be positively or negatively related to Y.
2. Time order: If we want to say that X causes Y, it cannot be equally true that Y causes X. Something has to happen first and cause a change.
3. No confounding variables: Once you have two (or more) correlated events with a time order that makes sense, you must make sure that there are no lurking variables mixing or masking what you found.
It’s fun to mess around with silly examples of correlation obviously not equaling causation, but it can get very serious.
The contention that vaccines cause autism is the largest and most harmful misunderstanding of correlation and causation of our time. Autism has been diagnosed more often, and more children are getting vaccinated more often. The data could certainly line up. But like the case with Nic Cage and helicopter deaths, the supposed vaccines and autism link has no basis to stand on. There is no theory to link the data together. Literally thousands of studies have found no connection. It’s as false as any you can make on Tyler Vigen’s website.
Good science is built on good data, but even more so on using that data correctly. It’s fun to mess around with correlations, as Vigen has brilliantly proved, only if you realize that numbers can lie. As they say, there are “lies, damned lies, and statistics.”
Kyle Hill is the Chief Science Officer of the Nerdist enterprise. Follow the nerdery on Twitter @Sci_Phile.
*The correlation number varies from -1 to 1. The closer the number is to either -1 or 1, the closer it gets to a perfect inverse or positive relationship, respectively.