Data Can’t Lie

Many people state that data can lie. Obviously data can’t lie.

There are three kinds of lies: Lies, damn lies and statistics – Mark Twain

Many people don’t understand the difference between being manipulated because they can’t understand what the data really says and data itself “lying” (which, of course, doesn’t even make sense). The same confusion can come in when someone just draws the wrong conclusion from the data that exists (and them blames the data for “lying” instead of themselves for drawing a faulty conclusion). The data can be wrong (and the data can even be made faulty intentionally by someone). Or someone can draw the wrong conclusion from data that is correct. But in neither case is the data lying. It is also common to believe the data means something other than what it does (therefore leading to a faulty conclusion).

For a very simple example, believing if the average height for adults in the USA is 5 feet 9 inches that half the people must be taller and half the people must be shorter. You could then draw the conclusion that half the adults must be shorter than 5 feet 9 inches. But that is not what an average height means (it is basically what median means, though if you want to get technical, it doesn’t mean exactly that). You might draw the conclusion that the average height of an adult in California is 5 feet 9 inches but that is not supported by only the data that says what the height of an average adult in the country is. The same hold for drawing the conclusion that 5 feet 9 inches is the average height of a women. Now in this simple example, hopefully people can see the faulty reasoning, but such reasoning often goes on without consideration.

In a great speech by Marisa Meyer she speaks of Google makes decisions using data and that data is apolitical. One benefit of this, she says, is that Google makes decisions on what the data supports not political considerations. The belief that basing decision on what the data supports leads to better decisions can seem false for those that accept the quote about 3 types of lies (or those that see there is some weakness to this point if those supposedly basis decisions on data don’t really understand how to do so).

If there is a lack of understanding of data then people can manipulate the data (chose the data that supports their claim, use false data, mislead about what the data really represents) to support their “politcal” end. They can also or draw a faulty conclusion that they claim the data supports (either due to incompetence in understanding what conclusions could be supported by the data or intentionally claiming a conclusion they know is not supported) and have not have that conclusion challenged.

Often people will try and pick certain data that they believe supports their conclusion. Even this act of deceit is not a case of the data lying. And often, that missing data is obviously missing (though if people don’t know how to think about data they might not notice). Just as if someone opened one door on the first floor of a house and said see here is the only bedroom. I showed this is trued because I just opened this door and there is a only one bedroom. Saying that because you showed me what was behind one door on the first floor and it was a bedroom in no way indicated the rooms upstairs are not bedrooms. What about the second floor that I could obviously see from the outside of the house? This is not about being un-trusting just noticing obvious flaws in reasoning. The conclusion drawn from showing one door meaning no more bedrooms is not sensible.

If all those involved understand how to draw conclusions from data it is not easy to mislead them. They will notice if you try to pick only that data that supports your claim, etc.. And they will notice if you failed to see something that the data shows that you did not consider. They will also notice if there are obvious gaps in the data that should be shown to support the conclusion you are drawing.

But when both sides do not understand data well they often see false claims made but cannot understand what is false (they don’t trust the person’s conclusion or the data but they don’t know how to question the claims made). Rather than become more educated on how to understand data they can fall back to things like claiming statistics lie. See common errors in interpreting data – the related links too]. Google is a rare place, where those trying to mislead by drawing false claims about the data (or trying to present a faulty picture by manipulating what data is chosen to make a case, claiming the data supports conclusions it does not…) would have a great deal of difficulty because those they are speaking to have too much knowledge to miss manipulation or sloppiness with the data itself and poor conclusions that are not actually supported as the presenter claims.

Strive to get your organization to the point where decisions are based on sound interpretations of the data. Don’t let your organization be one where people are afraid of data because they draw the wrong conclusions when presented with data (and can’t see when someone else claims support from data for their conclusion, but the data actually does not support that claim). One easy way to manipulate data (or get lousy data that leads people to draw false conclusion) is to have sloppy (or even more likely non-existent) operational definitions.