There is no true value of anything: data has meaning based on the operational definition used to calculate the data.
Walter Shewhart’s Statistical Method from the Viewpoint of Quality Control, forward by W. Edwards Deming:
Dr. Deming’s ideas on the theory of knowledge are the least understood and least seen in other management systems. The importance of understanding what data does, and does not tell you, is at least somewhat acknowledged in other management system but is often not found much in the actual practice of management. The execution often glosses over the importance of actually understanding statistics versus using formulas. Just using formulas is dangerous. It may be inconvenient but learning about the traps we can fall into in using data is important.
How often do you see the operational definition used to calculate the data you see with the data you are provided?
via: Shewhart, Deming and Data by Malcolm Chisholm
Related: How We Know What We Know – Pragmatism and Management Knowledge – Measuring and Managing Performance in Organizations – Dangers of Forgetting the Proxy Nature of Data
Excellent post. This is why six sigma was at the same time good and bad where I learned it and help implement it. We required people to clearly develop, expose to public view for review, and then do a validation of their op def (it was step 2 of twelve in our process – actually the validation part was step 3). People had been running around with data that had loose or no real operational definition for years. My opinion is that data that is not throughly understood as to what it represents (no operational definition) or comes from an unvalidated (statistically or otherwise) measurement system is just that – data – useless at best, dangerously misleading at worst. If you have a strong and logical operational definition, a valid way to measure, and you have gone to the place where the data is an actual event and watched quietly at first, and then asked questions second then you have EVIDENCE not data. Evidence is good and useful. That was one of the good parts of six sigma for us.
One of the bad parts was that we really didn’t teach statistics very well. We taught people how to put their evidence(hopefully not data by my definition) into a software program and a p value or transfer functiona came out that determined your actions for the next several steps of the process. This is really bad. People at minimum have to understand the experimental error in their procedures and be able to assess residuals. There are more things too but without these you have no idea how much variation remains unaccounted for in your model and you don’t even know if the basic assumptions of your procedure are satisfied. This is hard to train and probably requires a mindset of education in stead of pure training and unfortuneatly that is not as cheap and easy as many organizations would like.
Great and important post!
John,
Interesting thought, I would love to hear you expand upon it. I love statistics, which I know makes me a geek. One complaint I have is that in all the six sigma training there is a focus on PROCESSING data but not really interpreting or adding meaning to data. There seems to be a pattern of people believing their data is the one true reality, instead of just one representation of reality. This is a dangerous trap.
Well, there are lies, damned lies and statistics sure enough, and this post highlights the inevitable variation that exists irrespective of the motivation to manipulate
I don’t like the implication that statistics lie. Data doesn’t lie – but those that don’t understand statistics can make false assumptions and be encouraged to draw false conclusions by those that try and mislead them.
Those that understand how to draw conclusions from data are not easy to mislead (the misleading is not a result of the data but the knowledge of the person that someone attempts to mislead). They will notice if you try to pick only that data that supports your claim, etc..One easy way to manipulate data (or get lousy data that leads people to draw false conclusion) is to have sloppy (or even more likely non-existent) operational definitions.
You get the gist, though John. It’s a century old quote usually attributed to Disraeli. The point being made that selective use of statistics can be used to skew an argument one way or the other. It is always difficult to know whether you have got hold of all the relevant management information, because you don’t know what you don’t know, but people generally make their decisions and choices based on the information they have at hand. Governments for example have, and perhaps always will, make selective use of data to influence public opinion in a particular direction
I think for that reason it is more than being a case of it being a tough call for those that don’t understand the data, because often the totality of relevant data is deliberately withheld, making a truly informed choice impossible
In actual fact I think that Governments rely on a public assumption that data doesn’t lie. so in summary I guess that it is possible to disguise the truth with data and impossible to know the truth without it
Statistics don’t lie. People can lie using statistics. People can make mistakes when they do statistics but that isn’t lying. People who don’t understand what significance level means might think that statistics lie, but inferential statistics even tell you the probability that they are misleading you in the alpha and beta risks. Liars aren’t nearly that transparent.
Pingback: Curious Cat Management Improvement Blog » How to Manage What You Can’t Measure
Pingback: Curious Cat Management Improvement Blog » No True Lean Thinking or Agile Software Development
Pingback: There is No Such Thing as “True Unemployment Rate” at Curious Cat Investing and Economics Blog