Tag Archives: Statistics

How to Create a Control Chart for Seasonal or Trending Data

Lynda Finn, President of Statistical Insight, has written an article on how to create a control chart for seasonal or trending data (where there is an underlying structural variation in the data). Essentially you need to account for the structural variation to create the control limits for the control chart. She also provides a Minitab project file. Both are available for download from the Curious Cat Management Improvement Library.

Related: Control Charts in Health CareCommon Cause VariationManaging with Control ChartsMeasurement and Data CollectionFourth Generation Management

Statistics for Experimenters in Spanish

book cover of Estadística para Investigadores

Statistics for Experimenters, second edition, by George E. P. Box, J. Stuart Hunter and William G. Hunter (my father) is now available in Spanish.

Read a bit more can find a bit more on the Spanish edition, in Spanish. Estad√ɬ≠stica para Investigadores Dise√ɬĪo, innovaci√ɬ≥n y descubrimiento Segunda edici√ɬ≥n.

Statistics for Experimenters – Second Edition:

Catalyzing innovation, problem solving, and discovery, the Second Edition provides experimenters with the scientific and statistical tools needed to maximize the knowledge gained from research data, illustrating how these tools may best be utilized during all stages of the investigative process. The authors’ practical approach starts with a problem that needs to be solved and then examines the appropriate statistical methods of design and analysis.

* Graphical Analysis of Variance
* Computer Analysis of Complex Designs
* Simplification by transformation
* Hands-on experimentation using Response Service Methods
* Further development of robust product and process design using split plot arrangements and minimization of error transmission
* Introduction to Process Control, Forecasting and Time Series

Book available via Editorial Reverte

Related: Statistics for Experimenters ReviewCorrelation is Not CausationStatistics for Experimenters Dataposts on design of experiments

ASQ William Hunter Award 2008: Ronald Does

The recipient of the 2008 William G. Hunter Award is Ronald Does. The Statistics Division of the American Society for Quality (ASQ) uses the attributes that characterize Bill Hunter’s (my father – John Hunter) career – consultant, educator for practitioners, communicator, and integrator of statistical thinking into other disciplines to decide the recipient. In his acceptance speech Ronald Does said:

The first advice I received from my new colleagues was to read the book by Box, Hunter and Hunter. The reason was clear. Because I was not familiar with industrial statistics I had to learn this from the authors who were really practicing statisticians. It took them years to write this landmark book.

For the past 15 years I have been the managing director of the Institute for Business and Industrial Statistics. This is a consultancy firm owned by the University of Amsterdam. The interaction between scientific research and the application of quality technology via our consultancy work is the core operating principle of the institute. This is reflected in the type of people that work for the institute, all of whom are young professionals having strong ambitions in both the academic world and in business and industry.

The kickoff conference attracted approximately 80 statisticians and statistical practitioners from all over Europe. ENBIS was officially founded in June 2001 as “an autonomous Society having as its objective the development and improvement of statistical methods, and their application, throughout Europe, all this in the widest sense of the words” Since the first meeting membership has grown to about 1300 from nearly all European countries.

Related: 2007 William G. Hunter AwardThe Importance of Management ImprovementResources on using statistical thinking to improve management

Full and Fractional Factorial Test Design

An Essential Primer on Full and Fractional Factorial Test Design

Since full factorial gathers additional data, it reveals all possible interactions, but as seen by the numbers above, there is a trade-off. More data equals more information but more data also equals a longer test duration. The minimum data requirements for full factorial are very high since you are showing every experiment.

Even if you are using full factorial to get the same amount of information as a fractional factorial test, it will take more time since you need more data to see statistically relevant differences between the many experiments. You might be wondering how fractional factorial can be accurate if interactions are possible?

Random interactions of high relevance are very rare, especially when looking for interactions of more than 2 factors. You really need to design tests where you look for meaningful interactions that are based on true business requirements rather than hoping for a random and low influence interaction between a red button, a hero shot and a headline.

I am a fan of design of experiments as long time readers know (see posts on design of experiments).

Some good resources for more on the topics discussed above: What Can You Find Out From 8 and 16 Experimental Runs? by George Box – Statistics for ExperimentersDesign of Experiments in Advertising.

Related: Google Website Optimizerfactorial experiment articlesUsing Design of ExperimentsMarketers Are Embracing Statistical Design of Experiments

Stratification and Systemic Thinking

I am reading a fascinating book by Jessica Snyder Sachs: Good Germs, Bad Germs. From page 108:

At New York Hospital, Eichenwald and infectious disease specialist Henry Shinefield conceived and developed a controversial program that entailed deliberately inoculating a newborn’s nostrils and umbilical stump with a comparatively harmless strain of staph before 80/81 could move in. Shinefield had found the protective strain – dubbed 502A – in the nostrils of a New York Hospital baby nurse. Like a benign Typhoid Mary, Nurse Lasky had been spreading her staph to many of the newborns in her care. Her babies remained remarkably healthy, while those under the care of other nurses were falling ill.

This is a great example of a positive special cause. How would you identify this? First you would have to stratify the data. It also shows that sometimes looking at the who is important (the problem is just that we far too often look at who instead of the system so at times some get the idea that it is not ok to stratify data based on who – it is just be careful because we often do that when it is not the right approach and we can get fooled by random variation into thinking there is a cause – see the red bead experiment for an example); that it is possible to stratify the data by person to good effect.

The following 20 pages in the book are littered with very interesting details many of which tie to thinking systemically and the perils of optimizing part of the system (both when considering the system to be one person and also when viewing it as society).

I have recently taken to reading more and more about viruses, bacteria, cells, microbiology etc.: it is fascinating stuff.

Related: Science Books by topicData Can’t LieUnderstanding Data

Data Can’t Lie

Many people state that data can lie. Obviously data can’t lie.

There are three kinds of lies: Lies, damn lies and statistics – Mark Twain

Many people don’t understand the difference between being manipulated because they can’t understand what the data really says and data itself “lying” (which, of course, doesn’t even make sense). The same confusion can come in when someone just draws the wrong conclusion from the data that exists (and them blames the data for “lying” instead of themselves for drawing a faulty conclusion). The data can be wrong (and the data can even be made faulty intentionally by someone). Or someone can draw the wrong conclusion from data that is correct. But in neither case is the data lying. It is also common to believe the data means something other than what it does (therefore leading to a faulty conclusion).

For a very simple example, believing if the average height for adults in the USA is 5 feet 9 inches that half the people must be taller and half the people must be shorter. You could then draw the conclusion that half the adults must be shorter than 5 feet 9 inches. But that is not what an average height means (it is basically what median means, though if you want to get technical, it doesn’t mean exactly that). You might draw the conclusion that the average height of an adult in California is 5 feet 9 inches but that is not supported by only the data that says what the height of an average adult in the country is. The same hold for drawing the conclusion that 5 feet 9 inches is the average height of a women. Now in this simple example, hopefully people can see the faulty reasoning, but such reasoning often goes on without consideration.

In a great speech by Marisa Meyer she speaks of Google makes decisions using data and that data is apolitical. One benefit of this, she says, is that Google makes decisions on what the data supports not political considerations. The belief that basing decision on what the data supports leads to better decisions can seem false for those that accept the quote about 3 types of lies (or those that see there is some weakness to this point if those supposedly basis decisions on data don’t really understand how to do so).

Continue reading

All Models Are Wrong But Some Are Useful

“All Models Are Wrong But Some Are Useful” -George Box

A great quote. Here is the source: George E.P. Box, Robustness in the strategy of scientific model building, page 202 of Robustness in Statistics, R.L. Launer and G.N. Wilkinson, Editors. 1979.

See more quotes by George Box.

Related: Dangers of Forgetting the Proxy Nature of Dataarticles by George BoxQuotes by Dr. W. Edwards Deming

Performance Measures and Statistics Course

Performance Measures and Statistics Course [the broken link was removed] – free course materials from a 2 day training course by Steven Prevette. Topics include: Dr. Deming’s red bead experiment, operational definitions, selecting performance targets, SPC, theory of variation, case studies, control charts, pdsa, pareto charts, histograms

Related: Quality, SPC and Your Careerarticles by Steven Prevette

The Exciting Life of Industrial Statisticians

Never a Dull Day: The Life of an Industrial Statistician by Gerry Hahn.

Gerry Hahn was one of the great applied statisticians of the last 50 years, working at GE for over 45 years. Six sigma has many variants, he is one of those that understood how to apply six sigma well.

All of this provides great new opportunities for industrial statisticians to serve as statistical leaders-a term popularized by the late and great Ed Deming (see Hahn and Hoerl, 1998). Statistical leaders engage principally in leveraging statistical concepts and thinking (see Hoerl, Hooper, Jacobs and Lucas , 1993), and focus their activities on mentoring and supporting the most business-vital and technically challenging problems dealing with getting the right data, and converting such data into actionable information.

In 1991 Dr. Hahn received the Hunter Award from the ASQ Statistics Division (the award is named for my father – John).