Data Can’t Lie

Posted on August 9, 2007  Comments (7)

Many people state that data can lie. Obviously data can’t lie.

There are three kinds of lies: Lies, damn lies and statistics – Mark Twain

Many people don’t understand the difference between being manipulated because they can’t understand what the data really says and data itself “lying” (which, of course, doesn’t even make sense). The same confusion can come in when someone just draws the wrong conclusion from the data that exists (and them blames the data for “lying” instead of themselves for drawing a faulty conclusion). The data can be wrong (and the data can even be made faulty intentionally by someone). Or someone can draw the wrong conclusion from data that is correct. But in neither case is the data lying. It is also common to believe the data means something other than what it does (therefore leading to a faulty conclusion).

For a very simple example, believing if the average height for adults in the USA is 5 feet 9 inches that half the people must be taller and half the people must be shorter. You could then draw the conclusion that half the adults must be shorter than 5 feet 9 inches. But that is not what an average height means (it is basically what median means, though if you want to get technical, it doesn’t mean exactly that). You might draw the conclusion that the average height of an adult in California is 5 feet 9 inches but that is not supported by only the data that says what the height of an average adult in the country is. The same hold for drawing the conclusion that 5 feet 9 inches is the average height of a women. Now in this simple examples, hopefully people can see the faulty reasoning but such reasoning often goes on without consideration.

In a great speech by Marisa Meyer she speaks of Google makes decisions using data and that data is apolitical. One benefit of this, she says, is that Google makes decisions on what the data supports not political considerations. The belief that basing decision on what the data supports leads to better decisions can seem false for those that accept the quote about 3 types of lies (or those that see there is some weakness to this point if those supposedly basis decisions on data don’t really understand how to do so).

If there is a lack of understanding of data then people can manipulate the data (chose the data that supports their claim, use false data, mislead about what the data really represents) to support their “politcal” end. They can also or draw a faulty conclusion that they claim the data supports (either due to incompetence in understanding what conclusions could be supported by the data or intentionally claiming a conclusion they know is not supported) and have not have that conclusion challenged.

Often people will try and pick certain data that they believe supports their conclusion. Even this act of deceit is not a case of the data lying. And often, that missing data is obviously missing (though if people don’t know how to think about data they might not notice). Just as if someone opened one door on the first floor of a house and said see here is the only bedroom. I showed this is trued because I just opened this door and there is a only one bedroom. Saying that because you showed me what was behind one door on the first floor and it was a bedroom in no way indicated the rooms upstairs are not bedrooms. What about the second floor that I could obviously see from the outside of the house? This is not about being un-trusting just noticing obvious flaws in reasoning. The conclusion drawn from showing one door meaning no more bedrooms is not sensible.

If all those involved understand how to draw conclusions from data it is not easy to mislead them. They will notice if you try to pick only that data that supports your claim, etc.. And they will notice if you failed to see something that the data shows that you did not consider. They will also notice if there are obvious gaps in the data that should be shown to support the conclusion you are drawing.

But when both sides do not understand data well they often see false claims made but cannot understand what is false (they don’t trust the person’s conclusion or the data but they don’t know how to question the claims made). Rather than become more educated on how to understand data they can fall back to things like claiming statistics lie. See common errors in interpreting data – the related links too]. Google is a rare place, where those trying to mislead by drawing false claims about the data (or trying to present a faulty picture by manipulating what data is chosen to make a case, claiming the data supports conclusions it does not…) would have a great deal of difficulty because those they are speaking to have too much knowledge to miss manipulation or sloppiness with the data itself and poor conclusions that are not actually supported as the presenter claims.

Strive to get your organization to the point where decisions are based on sound interpretations of the data. Don’t let your organization be one where people are afraid of data because they draw the wrong conclusions when presented with data (and can’t see when someone else claims support from data for their conclusion, but the data actually does not support that claim). One easy way to manipulate data (or get lousy data that leads people to draw false conclusion) is to have sloppy (or even more likely non-existent) operational definitions.

7 Responses to “Data Can’t Lie”

  1. Curious Cat Investing and Economics Blog » Misuse of Statistics - Mania in Financial Markets
    August 18th, 2007 @ 3:24 pm

    Not every system is defined by a normal distribution – it is common for distributions to be close to normal but there is no reason any system need be. Many statistical tools have as an underlying assumption that the system in question is a normal distribution…

  2. Curious Cat Science and Engineering Blog » Bigger Impact: 15 to 18 mpg or 50 to 100 mpg?
    February 23rd, 2008 @ 9:07 pm

    It is great how a little understanding of math can help you see the errors in your initial beliefs…

  3. Curious Cat Science and Engineering Blog » Atlantic Hurricane Season 2008
    November 1st, 2008 @ 10:45 am

    [...] Data can’t lie but mistaken assumptions can lead you to form mistaken impressions. If you believe the number of named storms = hurricane activity and then are surprised that in fact there was many more days of hurricane activity it is not because the data lied but because you didn’t understand what the data represented. [...]

  4. Curious Cat Science and Engineering Blog » Poor Reporting and Unfounded Implications
    December 9th, 2008 @ 3:24 pm

    [...] scientific experimentation is how we learn, not trying to find anything that support our opinions. Statistics don’t lie but ignorant people draw faulty conclusions from data (when they are innumerate – illiteracy with [...]

  5. Low Mortgage Rates Not Available to Everyone @ Curious Cat Economics Blog
    January 1st, 2009 @ 1:10 pm

    The lowest 30 Year fixed mortgage rates in 37 years is great news for those looking to buy a house or to re-finance. However, that truth (the lowest rate) masks another truth, that it is available to a somewhat limited pool of borrowers…

  6. Curious Cat Science and Engineering Blog » Statistical Errors in Medical Studies
    March 14th, 2010 @ 10:59 am

    This post collects some discussion on the systemic reasons for medical studies presenting misleading results from several blogs and studies…

  7. Indirect Improvement » Curious Cat Management Improvement Blog
    December 24th, 2013 @ 7:39 am

    […] Dan’t Can’t Lie – Growing the Application of Management Improvement Ideas in Your Organization – Build […]

Leave a Reply

  • Recent Trackbacks

  • Comments