Here's what will happen in 2018 election (POLL)
Do you think Democrats will retake the House, the Senate, or both in the 2018 election?
[socialpoll id=“2479812”]
Now that you’ve answered, let’s consider the limits of data science in 2018.
The best data scientists in the world agreed with nearly absolute certainty. These scientists determined, beyond a reasonable doubt, that Hillary Clinton would be the 45th President of the United States. That was in 2016.
Those scientists were wrong.
Many of those same scientists applied their advanced degrees and supercomputers to the next problem: how would a Trump presidency affect the US economy. Joined by several Nobel-winning economists, the data scientists came to one conclusion with absolute certainty: Trump will sink the US economy, and smart people should sell 100% of their stock immediately following Trump’s win.
Those scientists and economists were all wrong.
It’s easy to blame partisanship and bias for these huge errors by the best data scientists and economists in the world. And it’s likely that bias played a role in their embarrassing failures to predict the future. But bias wasn’t the only problem. Nor was it the primary error.
The primary error in data science is its confidence in data.
Now, I’m a huge proponent of data science. I work with big companies who under-invest in analyzing their own data and in studying publicly available data. Data analysis and lightweight artificial intelligence and machine learning can greatly improve business results. I’ve helped smart companies achieve amazing growth through a combination of data science and human behavioral science. It’s what I do for a living.
(Find out how I used data science to predict that Trump had a good chance of winning.)
But I also know the limits of data science. And those limits are far more humbling than many data scientists admit. The biggest limit comes from unknowns. Scientists call these “confounding variables.” While eventually knowable, in our present limits of knowledge, the effect of an unknown, confounding variable cannot be measured or accounted for. I’ll give you an example.
Say I want to test a hypothesis. My hypothesis is that trees begin to change colors as a result of temperature changes in the fall. I also want to factor out some variables I know could affect the trees: humidity, cloud cover, rainfall, and heat stress. Then I run my test over 3 years.
In the end, I discover a perfect correlation between temperature change and leaf color change.
Only later do I learn that I missed one other variable: sunlight hours. In the fall in subtropical zones, the hours of daylight decrease and the hours of darkness grow. In October, trees in my part of North America receive several hours less sunlight than they did in June.
Upon further investigation, I learn that scientists had long ago determined that hours of sunlight, not temperature, cause trees to hibernate in the winter. Their transition from active growth to dormancy causes their leaves to change color. An oak tree in coastal California where fall temperatures are often warmer than in mid-summer turns colors just as it does in St. Louis. Here’s the science, according to the United States National Arboretum:
In late summer or early autumn, the days begin to get shorter, and nights are longer. Like most plants, deciduous trees and shrubs are rather sensitive to length of the dark period each day. When nights reach a [threshold value](https://usna.usda.gov/PhotoGallery/FallFoliage/ScienceFallColor.html#threshold values) and are long enough, the cells near the juncture of the leaf and the stem divide rapidly, but they do not expand. This [abscission layer](https://usna.usda.gov/PhotoGallery/FallFoliage/ScienceFallColor.html#abscisson layer) is a corky layer of cells that slowly begins to block transport of materials such as carbohydrates from the leaf to the branch. It also blocks the flow of minerals from the roots into the leaves. Because the starting time of the whole process is dependent on night length, fall colors appear at about the same time each year in a given location, whether temperatures are cooler or warmer than normal.
It took science a long time to figure that out. While a pretty simple problem that’s easily tested in both the laboratory and in the wild, trees are subject to many variables: wind, moisture, cloud cover, heat stress, terrain, parasites, deer, beavers, etc. But people have been studying trees for many, many years. And trees are less complex than the human brain.
Now, let’s go back to the problem of modern data science. Data scientists are mostly concerned with how people will behave at some point in the future. These scientists don’t care why leaves change colors in the fall. They care about how people (consumers) will respond to the leaves changing.
People are more complicated than trees, at least when it comes to their behavior. The factors that influence human behavior have also been studied for centuries. But our understanding of the factors that influence our behaviors is limited. And even the variables we know about are so varied and numerous that predicting how one variable affects all the others is as much art as science. (For example, shoppers who receive a free sample of luxury chocolate candy at a kiosk in a mall are more likely to make a purchase from a luxury retailer in the mall than the same shoppers in a mall that doesn’t give away free luxury candy.)
Which brings us back to the 2018 election.
It’s very possible that Democrats will take over the House and the Senate. It’s also possible that Republicans could increase their majorities in both houses. It’s also possible that something in between will happen. I don’t know. Neither do you. And neither do the greatest data scientists alive.
That’s the point. When you hear predictions, don’t be fooled by the math and science used to bolster those predictions. The scientists who did the work, usually in good faith, don’t know the variable they don’t know. Nor do they know the likelihood of a new variable creeping into the equation. Nor can they factor the influence of those infinite unknown variables. Take all predictions about elections with a grain of salt, and be especially circumspect if the prediction comes with a lot of easy-to-understand charts and graphs. And, if you have a strong belief in science, you are actually more susceptible to believing in charts and graphs.
In a study published in 2014, researchers showed how influential charts and graphs can be. From their abstract:
The appearance of being scientific can increase persuasiveness. Even trivial cues can create such an appearance of a scientific basis. In our studies, including simple elements, such as graphs (Studies 1–2) or a chemical formula (Study 3), increased belief in a medication’s efficacy. This appears to be due to the association of such elements with science, rather than increased comprehensibility, use of visuals, or recall.
And people who believe in science are most gullible:
Belief in science moderates the persuasive effect of graphs, such that people who have a greater belief in science are more affected by the presence of graphs (Study 2). Overall, the studies contribute to past research by demonstrating that even trivial elements can increase public persuasion despite their not truly indicating scientific expertise or objective support.
When you see scientific-looking studies predicting with 98% confidence how the 2018 election will turn out, remember this study and this blog. In fact, check out this chart which shows you are more likely than others to share this blog post on Twitter or Facebook. You’re also slightly more likely than others to remember this blog post when the actual results of the election are announced in November.
[Tweet “Amazing! This is how the 2018 election will turn out."]