Torturing the data till it lies

“Top 10 states for left-handers!” “Worst states for tall people” “Best country to travel to if you are 45!”

The Web is ripe with news features like this. Recipe: Assemble a basket of social measures for states or nations. Blend, rank and present as a measure of some condition. They are usually built as galleries of images or pages. Even as a reward for multiple clicks, they rarely offer a reader-friendly at-a-glance list.

The biggest problem with rankings like this: They use grouped data to conclude something about experiences that are much more tightly linked to local and personal factors.

This is the ecological fallacy. Put simply, you often can’t infer something about individuals because you have data about a group of them. This is especially true if the link that’s being claimed is barely plausible.


A simple and famous example: In the 1930 Census, a strong correlation existed between states’ English literacy rates and their shares of foreign-born people. But were immigrants more likely to be literate in English than native-born Americans? No. Census data for individuals showed the opposite, of course – Immigrants were less likely than natives to be literate in English. But immigrants had clustered in states with relatively high literacy rates, so grouped data made them seem more literate than natives.

Another example: In the presidential election of 1968, segregationist George Wallace won the electoral votes of AL, AR, GA, LA and MS. These states had the highest rates of black voters. Should we conclude that blacks voted strongly for Wallace?

States – diverse collections of people acting through laws and policies – exert little or no effect on many conditions in daily life, such as crime. And most social conditions vary within a state far more than they do among states. Data journalists spend a lot of time and sweat trying to get this right by collecting *local* crime rates or  student-pupil ratios before they start probing for patterns.

There are legitimate times to rank states, most obviously on something the state government itself can affect directly, like the climate for startup businesses or the strength of consumer protection laws.

And USA TODAY, has run such lists from content partners. They can be fun, clickable lists. But they really don’t tell us anything about ourselves.

So if your state ranks low as a place to be a coin collector or a Chevy driver, don’t fret.

–Paul Overberg


4 thoughts on “Torturing the data till it lies

