Tag Archives: Statistics

Poorly Stratified Data Leads to Mistakes in Analysis

Getting organization to think of data as critical to making effective decisions is often a challenge. But the very next problem is that while data is used it is actually more misused than used.

How Not to be Wrong (book cover)

What is important is not just having numbers mentioned when decisions are being made. Or even having numbers mentioned when those decisions are evaluated after they have been implemented (or course many organizations don’t even evaluate the results of many changes they adopt, but that is a different problem). What is important for “evidence based decision making” is that what those numbers actually mean must be understood. It is easy to be mislead if you don’t think critically about what the numbers tell you and what they do not.

Poorly stratified data is one problem that leads to mistakes in analysis.

How ZIP codes nearly masked the lead problem in Flint

As I ran the addresses through a precise parcel-level geocoding process and visually inspected individual blood lead levels, I was immediately struck by the disparity in the spatial pattern. It was obvious Flint children had become far more likely than out-county children to experience elevated blood lead when compared to two years prior.

How had the state so blatantly and callously disregarded such information? To me – a geographer trained extensively in geographic information science, or computer mapping – the answer was obvious upon hearing their unit of analysis: the ZIP code.

Their ZIP code data included people who appeared to live in Flint and receive Flint water but actually didn’t, making the data much less accurate than it appeared [emphasis added].

This type of assumption about data leading to mistakes in analysis is common. The act of using data doesn’t provide benefits is the data isn’t used properly. The more I see of the misuse of data to more importance I place on the skill of thinking critically. We must challenge assumptions and challenge what the data we look at actually means.

Continue reading

Don’t Expect Short Quotes to Tell the Whole Story

When people try to use a short quote as an accurate encapsulation of a management concept they will often be disappointed.

It is obvious that Dr. Deming believed that organizations failed to use data effectively to improve needed to change and use data effectively in order to thrive over the long term. He believed that greatly increasing the use of data in decision making would be useful. He also believe there were specific problems with how data was used, when it is was used. Failing to understand variation leads to misinterpreting what conclusions can appropriately be drawn from data.

Using data is extremely useful in improving performance. But as Deming quoted Lloyd Nelson as saying “the most important figures that one needs for management are unknown or unknowable.”

I believe Dr. Deming would have said something like “In God we trust, all others bring data” (I haven’t been able to find a source verifying he did say it). Others don’t believe he would referencing the Lloyd Nelson quote and all Deming’s other work showing that Dr. Deming’s opinion that data isn’t all that matters. I believe they are correct that Dr. Deming wouldn’t mean for the quote to be taken literally as a summation of everything he ever said. That doesn’t mean he wouldn’t use a funny line that emphasized an important message – we need to stop relying so much on unsubstantiated opinion and instead back up opinion with data (including experiments).

Quotes can help crystallize a concept and drive home a point. They are very rarely a decent way to pass on the whole of what the author meant, this is why context is so important. But, most often quotes are shared without context and that of course, leads to misunderstandings.

image of quote - It is wrong to suppose that if you can't measure it, you can't manage it - a costly myth.

A funny example of this is the Deming quote that you often see: “if you can’t measure it, you can’t manage it.” Deming did actually say that. But without the context you get 100% the wrong understanding of what he said. Deming’s full statement is “It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth.” Now normally much more context is required to truly understand the author’s point. But this is a funny example of how a quote can be even be accurate when passed on to you and yet completely misleading because it is taken out of context.

Continue reading

All Data is Wrong, Some is Useful

From my first blog post on this blog – Dangers of Forgetting the Proxy Nature of Data

we often fail to explore whether changes in the numbers (which we call results) are representative of the “true results” of the system or if the data is misleading.

Data is meant to provide us insight into a more complex reality. We need to understand the limitations when we look at “results” and understand data isn’t really the results but a representation we hope is close to reality so we can successfully use the data to make decisions.

But we need to apply thought to how we use data. Lab results are not the same are what happens in the field. It is cheaper and faster to examine results in a lab. But relying on lab results involves risk. That doesn’t mean relying on lab results is bad, we have to balance the costs and benefits of getting more accurate data.

But relying on lab results and not understanding the risk is dangerous. This is the same idea of going to the gemba to get an accurate understanding instead of relying on your ability to imagine reality based upon some data and ideas of what it is probably like.

photo of a Modified Yellow VW Beetle

VW Beetle (in Bangkok, Thailand) has some sort of modification along the back bumper but I don’t know what it is meant to do. Any ideas? More of my photos from Bangkok.

Volkswagen Drops 23% After Admitting Diesel Emissions Cheat

Volkswagen AG lost almost a quarter of its market value after it admitted to cheating on U.S. air pollution tests for years

During normal driving, the cars with the software — known as a “defeat device” — would pollute 10 times to 40 times the legal limits, the EPA estimated. The discrepancy emerged after the International Council on Clean Transportation commissioned real-world emissions tests of diesel vehicles including a Jetta and Passat, then compared them to lab results.

Obviously VW was managing-to-test-result instead of real world value. It seems they were doing so intentionally to provide misleading data. Obviously one of the risks with lab test results (medical trials etc.) is that those with an interest in showing better results could manipulate the data and lab procedures (or systems) to have the data show their product in the most favorable light.

Continue reading

George Box Webcast on Statistical Design in Quality Improvement

George Box lecture on Statistical Design in Quality Improvement at the Second International Tampere Conference in Statistics, University of Tampere, Finland (1987).

Early on he shows a graph showing the problems with American cars steady over a 10 years period. Then he overlays the results for Japanese cars which show a steady and significant decline of the same period.

Those who didn’t get to see presentations before power point also get a chance to see old school, hand drawn, overhead slides.

He discusses how to improve the pace of improvement. To start with informative events (events we can learn from) have to be brought to the attention of informed observers. Otherwise only when those events happen to catch the attention of the right observer will we capture knowledge we can use to improve. This results in slow improvement.

A control chart is an example of highlighting that something worth studying happened. The chart will indicate when to pay attention. And we can then improve the pace of improvement.

Next we want to encourage directed experimentation. We intentionally induce informative events and pay close attention while doing so in order to learn.

Every process generates information that can be used to improve it.

He emphasis the point that this isn’t about only manufacturing but it true of any process (drafting, invoicing, computer service, checking into a hospital, booking an airline ticket etc.).

He then discussed an example from a class my father taught and where the students all when to a TV plant outside Chicago to visit. The plant had been run by Motorola. It was sold to a Japanese company that found there was a 146% defect rate (which meant most TVs were taken off the line to be fixed at least once and many twice) – this is just the defect rate before then even get off the line. After 5 years the same plant, with the same American workers but a Japanese management system had reduced the defect rate to 2%. Everyone, including managers, were from the USA they were just using quality improvement methods. We may forget now, but one of the many objections managers gave for why quality improvement wouldn’t work in their company was due to their bad workers (it might work in Japan but not here).

He references how Deming’s 14 points will get management to allow quality improvement to be done by the workforce. Because without management support quality improvement processes can’t be used.

With experimentation we are looking to find clues for what to experiment with next. Experimentation is an iterative process. This is very much the mindset of fast iteration and minimal viable product (say minimal viable experimentation as voiced in 1987).

There is great value in creating iterative processes with fast feedback to those attempting to design and improve. Box and Deming (with rapid turns of the PDSA cycle) and others promoted this 20, 30 and 40 years ago and now we get the same ideas tweaked for startups. The lean startup stuff is as closely related to Box’s ideas of experimentation as an iterative process as it is to anything else.

Related: Ishikawa’s seven quality control tools

He also provided a bit of history that I was not aware of saying the first application of orthogonal arrays (fractional factorial designs) in industry was by Tippett in 1933. And he then mentioned work by Finney in 1945, Plackett and Burman in 1946 and Rao in 1947.

George Box Articles Available for a Short Time

A collection of George Box articles have been selected for a virtual George Box issue by David M. Steinberg and made available online.

George E. P. Box died in March 2013. He was a remarkably creative scientist and his celebrated professional career in statistics was always at the interface of science and statistics. George Box, J. Stuart Hunter and Cuthbert Daniel were instrumental in launching Technometrics in 1959, with Stu Hunter as the initial editor. Many of his articles were published in the journal. Therefore we think it is especially fitting that Technometrics should host this on-line collection with some of his most memorable and influential articles.

They also include articles from Journal of the American Statistical Association and Quality Engineering. Taylor & Francis is offering these articles freely in honor of George Box until December 31st, 2014. It is very sad that closed science and engineering journals block access to the great work created by scientists and engineers and most often paid for by government (while working for state government universities and with grants organizations like the National Science Foundation[NSF]). At least they are making a minor exception to provide the public (that should be unlimited access to these works) a limited access to these articles this year. These scientists and engineers dedicated their careers to using knowledge to improve society not to hide knowledge from society.

Some of the excellent articles make available for a short time:

The “virtual issue” includes many more articles.

Related: Design of Experiments: The Process of Discovery is IterativeQuotes by George E.P. BoxThe Art of DiscoveryAn Accidental Statistician: The Life and Memories of George E. P. Box

Continue reading

Analysis Must be Implemented by People to Provide Value

Guest Post by Bill Scherkenbach

photo of W. Edwards Deming with a cat

Every time I look at this picture, I think of Dr. Deming’s words to drive out fear and take joy in your work. We were talking in my home office when Sylvester saw a good lap and took it. Our conversation immediately shifted when both Dr. Deming and Sylvester started purring.

The greatest statistical analysis is nothing if it can’t be implemented by people. But people learn in different ways. Some like good stories, others like pictures. Only a few like equations. Dr. Deming always liked a good laugh; and a good purr.

By what method do you get your analyses implemented?

Bill Scherkenbach taught with Dr. Deming at the Deming 2 day seminars and received the Deming Medal and the author of several books on Deming management principles.

Related: How to Get a New Management Strategy, Tool or Concept Adopted part 1 and part 2Getting Known Good Ideas AdoptedRespect People by Creating a Climate for Joy in WorkPlaying Dice and Children’s Numeracy

Stu Hunter Discussing Bill Hunter, Statistics for Experimenters and EVOP

In this clip, Stu Hunter talks about Bill Hunter (my father, and no relation to Stu Hunter), Statistics for Experimenters and EVolutionary OPerations (EVOP).

Stu mentions Bill Hunter’s work with the City of Madison, which started with the First Street Garage (Out of the Crisis included a short write up on this effort by Dad, which, I believe, was the first application of Deming’s ideas in the public sector).

There was also a great deal of work done with the Police department, as the police chief, David Couper, saw great value in Deming’s ideas. The Police department did some great work and David’s blog shares wonderful ideas on improving policing. I don’t think Dad was that directly involved in what happened there, but it is one of the nice benefits of seeding new ideas: as they take root and grow wonderful things happen without any effort on your part.

As to why Dad got involved with the city, he returned from a summer teaching design of experiments and quality improvement methods in China (this is just before China was really open, a few outsiders were let in to teach). We had also lived overseas several other times, always returning to Madison. He decided he wanted to contribute to the city he loved, Madison, and so he talked to the Mayor about helping improve performance of the city.

The mayor listened and they started with a pilot project which Dad work on with Peter Scholtes. Dad talked to Peter, who he had know for years, and who worked for the city, before talking to the mayor. Read more about the efforts in Madison via the links at the end of this post.

Continue reading

Design of Experiments: The Process of Discovery is Iterative

This video is another excerpt on the design of experiments videos by George Box, see previous posts: Introduction to Fractional Factorial Designed Experiments and The Art of Discovery. This video looks at learning about experimental design using paper helicopters (the paper linked there may be of interest to you also).

In this example a screening experiment was done first to find those factors that have the largest impact on results. Once the most important factors are determined more care can be put into studying those factors in greater detail.

The video was posted by Wiley (with the permission of George’s family), Wiley is the publisher of George’s recent autobiography, An Accidental Statistician, and many of his other books.

The importance of keeping the scope (in dollars and time) of initial experiments down was emphasized in the video.

George Box: “Always remember the process of discovery is iterative. The results of each stage of investigation generating new questions to answered during the next.”

Soren Bisgaard and Conrad Fung also appear in this except of the video.

The end of the video includes several suggested resources including: Statistics for Experimenters, Out of the Crisis and The Scientific Context of Quality Improvement.

Related: Introductory Videos on Using Design of Experiments to Improve Results (with Stu Hunter)Why Use Designed Factorial Experiments?brainstormingWhat Can You Find Out From 12 Experimental Runs?

Stated Versus Revealed Preference

My father provided me a good example of the flawed thinking of relying on stated preference when I was growing up. Stated preference is, as you might deduce, the preferences voiced by customers when you ask. This is certainly useful but people’s stated preference often do not match there actions. And for a business, actions that lead to customers are more important than claims potential customers make about what will make them customers.

His example was that if you ask people if clean bathrooms in a restroom is required for a restaurant they will say yes. Potential customers will say this is non-negotiable, it is required. But if you eat at many “ethnic restaurants,” as we always did growing up, you would see many popular restaurants did not have clean restrooms. If the food at atmosphere was good enough clean restrooms were negotiable, even if customers stated they were not.

Now I think clean restrooms is a wise move for restaurants to make; it matters to people. Instead of creating a barrier to repeat customers that has to be overcome with much better food and atmosphere it is wiser to give yourself every advantage by giving the customers what they want. But I think the example is a simple example of stated versus revealed preferences.

McDonald’s gets a great deal of success by doing certain things well, including clean bathrooms, even if they miss on things some people think are important for a restaurant. McDonald’s really gets a fair amount of business for people driving a long distance that really want a clean bathroom and a quick stretch of their legs and quick food. This is a small percentage of McDonald’s customer visits but still a very large number of visits each day I am sure. Understanding, and catering to, the problem your customers are trying to solve is important.

The point to remember is what your potential customers say they will do is different than what they do. It is sensible to listen to stated preferences of customers just understand them for what they are.

We need to pay more attention to revealed preferences. Doing so can require putting in a bit more thinking than just asking customers to fill out a questionnaire. But it is worth the effort. A simple restaurant based example would be to have wait staff pay attention to what people leave on their plate. If you notice certain side dishes are not eaten more often, look into that and see what can be done (improving how it is prepared, substituting something else…).

Related: Voice of the CustomerThe Customer is the Purpose of Our WorkCustomers Are Often IrrationalPackaging Affects Our Perception of TasteBe Careful What You Measure

The Art of Discovery

Quality and The Art of Discovery by Professor George Box (1990):


Quotes by George Box in the video:

“I think of statistical methods as the use of science to make sense of numbers”

“The scientific method is how we increase the rate at which we find things out.”

“I think the quality revolution is nothing more, or less, than the dramatic expansion of the of scientific problem solving using informed observation and directed experimentation to find out more about the process, the product and the customer.”

“It really amounts to this, if you know more about what it is you are doing then you can do it better and you can do it cheaper.”

“We are talking about involving the whole workforce in the use of the scientific method and retraining our engineers and scientists in a more efficient way to run experiments.”

“Tapping into resources:

  1. Every operating system generates information that can be used to improve it.
  2. Everyone has creativity.
  3. Designed experiments can greatly increase the efficiency of experimentation.

An informed observer and directed experimentation are necessary for the scientific method to be applied. He notes that the control chart is used to notify an informed observer to explain what is special about the conditions when a result falls outside the control limits. When the chart indicates a special cause is likely present (something not part of the normal system) an informed observer should think about what special cause could lead to the result that was measured. And it is important this is done quickly as the ability of the knowledgable observer to determine what is special is much greater the closer in time to the result was created.

The video was posted by Wiley (with the permission of George’s family), Wiley is the publisher of George’s recent autobiography, An Accidental Statistician: The Life and Memories of George E. P. Box, and many of his other books.

Related: Two resources, largely untapped in American organizations, are potential information and employee creativityStatistics for Experimenters (book on directed experimentation by Box, Hunter and Hunter)Highlights from 2009 George Box SpeechIntroductory Videos on Using Design of Experiments to Improve Results (with Stu Hunter)