Poorly Stratified Data Leads to Mistakes in Analysis

Getting organization to think of data as critical to making effective decisions is often a challenge. But the very next problem is that while data is used it is actually more misused than used.

How Not to be Wrong (book cover)

What is important is not just having numbers mentioned when decisions are being made. Or even having numbers mentioned when those decisions are evaluated after they have been implemented (or course many organizations don’t even evaluate the results of many changes they adopt, but that is a different problem). What is important for “evidence based decision making” is that what those numbers actually mean must be understood. It is easy to be mislead if you don’t think critically about what the numbers tell you and what they do not.

Poorly stratified data is one problem that leads to mistakes in analysis.

How ZIP codes nearly masked the lead problem in Flint

As I ran the addresses through a precise parcel-level geocoding process and visually inspected individual blood lead levels, I was immediately struck by the disparity in the spatial pattern. It was obvious Flint children had become far more likely than out-county children to experience elevated blood lead when compared to two years prior.

How had the state so blatantly and callously disregarded such information? To me – a geographer trained extensively in geographic information science, or computer mapping – the answer was obvious upon hearing their unit of analysis: the ZIP code.

Their ZIP code data included people who appeared to live in Flint and receive Flint water but actually didn’t, making the data much less accurate than it appeared [emphasis added].

This type of assumption about data leading to mistakes in analysis is common. The act of using data doesn’t provide benefits is the data isn’t used properly. The more I see of the misuse of data to more importance I place on the skill of thinking critically. We must challenge assumptions and challenge what the data we look at actually means.


We must think about how the data could be providing a skewed view of the real world situation that it is being used to explain. Data can’t just be plugged into a spreadsheet or software and provide conclusions that can be safely accepted as true. People must continually question what the data does and does not tell us.

How to Use Data and Avoid Being Mislead by Data:

How Not to be Wrong is an excellent book by Jordan Eilenberg on how to use math to avoid making mistakes. A great deal of the book is about the dangers of mis-interpreting data and how to avoid being misled.

In a previous post, Stratification and Systemic Thinking, I discussed another example of stratification. Stratification by those babies that one nurse had treated showed they were much healthier. From that they learned she carried a protective strain of staph that served to inoculated babies in her care from harmful staph infections.

Stratification can help provide insight into the nature of the real world situation. Determining what is a sensible conclusion to draw and what is not is aided by an understanding of statistics and expert knowledge of the situation the data is drawn from. At times the data may conflict with expert knowledge but with the appropriate respect for evidence and experimentation those surprising results can be challenged (and often explained) or can hold up to challenges and then subject to experiments to learn more about what is going on.

Related: Stratify Data to Hone in on Special Causes of ProblemsSeeing Patterns Where None ExistsHow to Use Data and Avoid Being Mislead by DataDon’t Forget the Proxy Nature of Data

One thought on “Poorly Stratified Data Leads to Mistakes in Analysis

  1. Pingback: Five for Friday – April 27, 2018

Leave a Reply

Your email address will not be published. Required fields are marked *