German utility E.On says major European blackout was caused by human error [the broken link was removed]
The Duesseldorf-based company said the power outage, which led to blackouts in parts of Germany, France, Belgium, Italy, Portugal and Spain on Nov. 4, was not caused by a lack of proper maintenance or enough investment in transmission grids and facilities.
The blackout was caused after a high-voltage transmission line over a German river was turned off in an aborted attempt to allow a newly built Norwegian cruise ship to pass safely under it.
That triggered a blackout that briefly left 10 million people without power, stopping trains in their tracks and trapping people in elevators.
Ok, the focus seems to be that we didn’t do anything wrong, just some “human” made an error, which seems to be implied is out of their control. Why would the organization not be responsible for the people and the system working together? Management needs to create systems that works. That system includes people and equipment and process management and suppliers…
E.ON says human error responsible for Nov 4 power outage [the broken link was removed]:
About half an hour later there was an outage at a second transmission line, which ultimately created a domino effect that led to the temporary disconnection of the European interconnected power grid.
The German utility said that all systems reacted in accordance with standard procedures, effectively preventing a complete blackout across Europe.
It seems obvious the process was not well designed if they believe a mistake was made that led to the tens of millions of people being without power. Failing to admit that the process was designed poorly and needs to be improved is troubling. Blaming “human error” does not help or help improve in the future (and is not a way to develop a culture that respects people). And it reinforces the notion that this event is due to one special cause (or 2…). It seems to me, even with this very little evidence at hand, that this is a system problem.
Why design a system where “human failure to check whether the outage of a second transmission line” would cause such a loss of power across Europe? By saying human error was involved you imply the person should have done something differently meaning your system should have been designed to prevent such a mistake from occurring.
I don’t understand anything about power systems so there could be numerous ways the system should be designed to be better (more robust, less susceptible to failure). But at the absolute least, given that they say it was an error not to check for the impact of a second line failure why allow the line to be shut down without that check being done? Mistake proof the process.
If management tries to claim a failure was due to “human error” they have to provide me a great deal more evidence on why the system was designed to allow that error (given that they say the error is “human” implies that they believe the system should have been able to cope with the situation). Requesting that evidence is the first thing reporters should ask any time they are given such excuses. At which time I imagine the response options are:
1) no comment
2) we had considered this situation and looked at the likelihood of such an event, the cost of protecting against it (mistake proofing) and the cost of failure meant and decided that it wasn’t worth the cost of preventing such failures
3) we didn’t think about it
4) we think it is best not to design systems to be robust and mistake proof but rather rely on people to never make any mistakes
What they will likely say is we have these 3 procedures in place to prevent that error.
Are they every followed? You have something written on paper, big deal? What actually happens?
Yes they are always followed by everybody this one time was the only time ever that it was not followed. Why?
This person made a mistake.
Why did the system allow that mistake to be made?
What? You can’t expect us to design systems that prevent mistakes from being made.
Yes I can. That is much more sensible than expecting people never to make a mistake.
For it really to be a “human error” I would expect that error proofing is in place. And that those measures are always followed. And that it would be obvious if such procedures were not followed. And that this was the one time they were not followed, the person made a mistake (or intentional circumvented some process) in a way that 1) could not be predicted and 2) was not a standard work around. One special note if it was a training problem that is not what I would call “human error” that is a system problem. If it is a scheduling problem that means someone had to do a job they were not properly trained for and made an error that is not really a “human error” that is system problem in scheduling that created the conditions for the error.
Processes can slip out of alignment with the designed plan. They need to be monitored for this happening. If a process does slip, reaccess if there is a flaw in the originally reasoning that means an improvement should be adopted or how best to assure such process designs can be made more effective.
Much more like than “human error” is that the system was designed in a way that did not anticipate certain conditions. Those conditions all came together to lead to the bad result. The lesson is to put more effort into making the system robust and error proofing it. Most often you need to examine all the data and take action before significant breakdowns occur not just the data surrounding the event in question.
A simple related example. I can’t spell well. The process of posting to the blog allows me to post without doing a spell check. That is not ideal – I have to remember to spell check. And I could create a process that did not allow this but I am willing to take that risk. With email I have an option to require a spell check before the email is sent. That is a good mistake proofing option. If I could set WordPress to do a spell check before I post I would, but… If I submit a post with misspelled word I would assign that error not to “human error” in misspelling (though it would be) but to a poorly designed process. The impact of how you classify the error is in how you will act. Performance will improve much more almost all the time if your fix is a improvement of the system not just, in this case, telling myself not to misspell and avoid making such “human errors” from now on John. Google toolbar is great for spell checking – I did remember this time 🙂