Managing to Test Result Instead of Customer Value

Computer hardware and software creators use benchmarks as one tool to compare the performance of alternative products. At times this can be very useful. You can learn what software of hardware is faster and that may be a very valuable factor. However, any measure is determined by the operational definitions used in collecting the measure. And if people have incentives to improve the measured number they often will do just that (improving the measure) rather than improving the system (the measure is meant to serve as a proxy for some function of that system).

Information technology people actually understand this much better than most mangers (who also rely on measures for many things like return on equity, profit growth, productivity of various plants…) – so actually I find they are not nearly as fooled by measures compared to managers. On Reddit there is an interesting discussion on coding the product to provide good benchmark results [in this context benchmarking has to do with measured results on standard performance tests – not TQM style benchmarking). The technical details in this case don’t matter so much to my point, which is just that when people treat the measure as the true value instead of a proxy for the true value it is risky.

Technology companies compete fiercely and claiming the software or hardware is faster is one big area of competition. And the comment on Reddit is claiming one competitor changed some code only to get a better measure (that provides no benefit to customers). The problem with such actions, is they provide no actual value: all they do is make the measure less meaningful as a proxy.

Now it is also perfectly understandable why it would be done – when you are focused on improving the number, it might well be easier to distort the system to provide a better number (used by to measure performance) instead of actual improve the performance. It is easy to see why a company would do this if they want to have marketing claim their products are the fastest.

Proxies are not perfect and it is often easier to improve the measure rather than the system. Anyone that creates incentives (bonuses, quotas, threats…) to meet numbers will create many people managing to the number target – often to the detriment of the entire system (in the example, noted above, their is not much of reduction in system performance, but in many cases there is: Why Setting Goals can Backfire).

A comment on the original post shows someone that understands the system (and shows how using system thinking can provide you great benefit with little work). They figured out how to get an improvement in others ability to process their code quickly to be part of the benchmark. Therefore others had an incentive to specifically target changes that directly benefited them. Smart.

The most clever thing I’ve ever seen with benchmarks turned that around. That is, instead of changing the code to do well on the benchmarks, the author changed the benchmarks so that people would make computers and processors to run his code well.

The code was Dave Fotland’s “Many Faces of Go”, which was one of the top computer Go programs at the time. Fotland donated a simplified version of his evaluation function to the SPEC people, who made it one of the tests in the SPEC suite.

SPEC was a major comparison used to compare processors, and to compare compilers on a given processor. Thus, Intel and AMD and Sun and IBM all spent considerable effort making SPEC run fast on their new hardware–and thus making Fotland’s evaluation function run fast.

Same for compiler writers. They made sure their compilers generated very good code for SPEC, and hence for Fotland’s evaluation function.

When I can’t get people to understand the problems with focusing primary a measure (that is a proxy for the real results we want), I try to think about the system and figure out how to design the measures with an understanding of systems thinking (like above) and psychology to limit the harm done by focusing primarily on proxies instead of the most important results (that are often very hard to measure).

Previous posts on this topic: Metrics and Software DevelopmentMeasurement and Data CollectionUnderstanding DataThe Defect Black Market

This entry was posted in Creativity, IT, Management, Software Development, Systems thinking and tagged , , , , , . Bookmark the permalink.

7 Responses to Managing to Test Result Instead of Customer Value

  1. Marc Hersch says:

    John, I like your idea of “proxy for the real results”. I strikes me that in some part, the use of proxy measures represents a habit of behavior in which we abbreviate experience and observations in the interest of efficiency. It is easy for the meaning behind our abbreviations to become lost in the rush to get the job done. Maybe there is some method that we could use to remind ourselves of the meanings behind the measures we choose to use as proxies. This kind of “meaning tickler” would be designed to force us to think critical about the proxy measures employed.

    Habituation to symbolic proxies for complex ideas (benchmark measures, 3-letter acronyms, tech-speak, etc.) creates efficiency, but it also diminishes our powers of observation, study, and understanding.

  2. This a complex problem but a very pervasive one. While you’re concentrating on its existence in software, it of course is also true in any measurement environment. I’ve seen companies set up whole departments whose sole function appears to be developing metrics that make people look good, instead of figuring out how to be better.

  3. Justin H. says:

    Very cool post.

    I especially liked your point that: “When I can’t get people to understand the problems with focusing primary a measure (that is a proxy for the real results we want), I try to think about the system and figure out how to design the measures with an understanding of systems thinking (like above) and psychology to limit the harm done by focusing primarily on proxies instead of the most important results (that are often very hard to measure).”

    There is a good, somewhat related article at:

    http://www2.computer.org/cms/Computer.org/ComputingNow/homepage/2009/0709/rW_SO_Viewpoints.pdf

    In it, Tom DeMarco – a high-visibility proponent of rigorously measuring Software Engineering metrics for the last 40 years – largely recants his earlier advocacy for the widespread use of metrics. I like your approach better.

    – Justin Hunter
    Founder of http://www.hexawise.com

  4. Pingback: Il meglio della blogosfera lean #9 — Encob Blog

  5. ricky says:

    In it, Tom DeMarco – a high-visibility proponent of rigorously measuring Software Engineering metrics for the last 40 years – largely recants his earlier advocacy for the widespread use of metrics. I like your approach better.

  6. Pingback: Leanpub Podcast on My Book – Management Matters: Building Enterprise Capability » Curious Cat Management Blog

  7. Pingback: Distorting the System, Distorting the Data or Improving the System « The W. Edwards Deming Institute Blog

Comments are closed.