You can easily reach misleading conclusions if you don’t carefully slice your data, consider lurking variables, and think about causality.
Consider the following true stories:
A lower percentage of Democrats than Republicans voted in favor of the 1964 Civil Rights Act, despite the fact that it was originally proposed by one Democratic president (Kennedy) and eventually signed by another (Lyndon Johnson).
In the 1970s, women were admitted to graduate programs at Berkeley at much lower rates than men, leading to the threat of a gender discrimination lawsuit in 1973. Yet an exhaustive inquiry found no evidence of gender discrimination.
Voters with incomes above $50,000 were more likely to vote for Trump in 2016 than voters with incomes below $50,000, yet political scholarship and conventional wisdom heavily attribute support for Trump to frustrated working-class voters.
A 1986 study found that non-invasive kidney stone removal had a higher success rate than traditional open surgery, yet open surgery remained the standard procedure.
Across the US economy, overall earnings have risen since 2000, even though earnings for every education bracket have declined.
How can it be that the numbers, used by intelligent people, tell opposing stories?
These stories are examples of Simpson’s Paradox. In words, it can be divided into two fallacies:
Fallacy of Division: What is true of the whole is not always true of the parts
Fallacy of Composition: What is true of the parts is not always true of the whole
These two maxims hold true in every single one of the stories described above.
Consider the numbers from the story of the Civil Rights Act. In 1964, the numbers in the Senate looked like this:
A higher percentage of Republicans voted for the Civil Rights Act than Democrats, despite the Act being the darling of Democratic Presidents Kennedy and Johnson. Slice the data by region, though, and you find that the vote split almost entirely along old sectional lines. A century after the Civil War, Democrats were still the dominant party in the eleven states of the former Confederacy and Southern Democrats voted overwhelmingly against the Act. Of the Southern states, all except Texas had two Democratic senators who voted “Nay.”
Only one Southern Senator — a Texas Democrat, Ralph Yarborough — voted for the Civil Rights Act. In the North, both parties voted overwhelmingly in favor of the Civil Rights Act, though Democrats were even more enthusiastic (98%) than Republicans (84%). Very similar ratios prevailed in the House.
Southern and Northern Democrats considered together did not at all resemble Northern or Southern Democrats considered separately. The lurking variable was the importance of regional attitudes at a time that Jim Crow was being challenged and dismantled. Constituencies in the South were far more hostile to the Act than constituencies elsewhere.
The other stories from our list also all suffer from lurking variables.
At Berkeley, it turned out that women and men applied to different programs and departments within the Graduate Division. Women gravitated toward programs with fewer available positions and lower acceptance rates, while men gravitated toward programs with higher acceptance rates. Since applications were handled on a departmental basis, the lurking variable is the department students applied to.
During the 2016 election, voters earning less than $50,000 a year did indeed favor Hillary because voters in that bucket are disproportionately represented by ethnic minorities who typically vote Democrat. White working-class voters favored Trump more strongly than they did previous Republican candidates — enough to swing several battleground states. Race (and, potentially, urbanicity) is the lurking variable in this case.
In the case of kidney stone treatments, it turns out that less-invasive surgeries were performed on less severe cases of kidney stones, explaining the higher success rate. The lurking variable, in this case, is the severity of the ailment.
The median income in the United States continues to rise even as income for people at every level of education declines because a much higher proportion of people now have bachelor’s degrees. The lurking variable here is the change in time of the proportions of people with different levels of education.
For practical business purposes, plenty of dimensions can function as lurking variables. Local and international users might be present in different proportions and respond differently to certain products or marketing initiatives. Your pool of lower-margin customers might grow at a different rate than your pool of higher-margin customers.
All of these can produce seemingly conflicting measurements and deserve careful investigation. Measuring the right things is hard, and you don’t want to leave figurative stones unturned. As a rule of thumb, numbers must be analyzed at deeper than face value. The sniff test is whether you have accounted for all obvious outside influences, and above all whether you can construct a coherent story about how these measurements came about.
P.S. Don’t panic, but double (and even n-tuple) Simpson’s Paradoxes are possible, too!
Update your browser to view this website correctly. Update my browser now