Computer hardware and software creators use benchmarks as one tool to compare the performance of alternative products. At times this can be very useful. You can learn what software of hardware is faster and that may be a very valuable factor. However, any measure is determined by the operational definitions used in collecting the measure. And if people have incentives to improve the measured number they often will do just that (improving the measure) rather than improving the system (the measure is meant to serve as a proxy for some function of that system).
Information technology people actually understand this much better than most mangers (who also rely on measures for many things like return on equity, profit growth, productivity of various plants…) – so actually I find they are not nearly as fooled by measures compared to managers. On Reddit there is an interesting discussion on coding the product to provide good benchmark results [in this context benchmarking has to do with measured results on standard performance tests – not TQM style benchmarking). The technical details in this case don’t matter so much to my point, which is just that when people treat the measure as the true value instead of a proxy for the true value it is risky.
Technology companies compete fiercely and claiming the software or hardware is faster is one big area of competition. And the comment on Reddit is claiming one competitor changed some code only to get a better measure (that provides no benefit to customers). The problem with such actions, is they provide no actual value: all they do is make the measure less meaningful as a proxy.
Now it is also perfectly understandable why it would be done – when you are focused on improving the number, it might well be easier to distort the system to provide a better number (used by to measure performance) instead of actual improve the performance. It is easy to see why a company would do this if they want to have marketing claim their products are the fastest.
Proxies are not perfect and it is often easier to improve the measure rather than the system. Anyone that creates incentives (bonuses, quotas, threats…) to meet numbers will create many people managing to the number target – often to the detriment of the entire system (in the example, noted above, their is not much of reduction in system performance, but in many cases there is: Why Setting Goals can Backfire).
A comment on the original post shows someone that understands the system (and shows how using system thinking can provide you great benefit with little work). They figured out how to get an improvement in others ability to process their code quickly to be part of the benchmark. Therefore others had an incentive to specifically target changes that directly benefited them. Smart.
The code was Dave Fotland’s “Many Faces of Go”, which was one of the top computer Go programs at the time. Fotland donated a simplified version of his evaluation function to the SPEC people, who made it one of the tests in the SPEC suite.
SPEC was a major comparison used to compare processors, and to compare compilers on a given processor. Thus, Intel and AMD and Sun and IBM all spent considerable effort making SPEC run fast on their new hardware–and thus making Fotland’s evaluation function run fast.
Same for compiler writers. They made sure their compilers generated very good code for SPEC, and hence for Fotland’s evaluation function.
When I can’t get people to understand the problems with focusing primary a measure (that is a proxy for the real results we want), I try to think about the system and figure out how to design the measures with an understanding of systems thinking (like above) and psychology to limit the harm done by focusing primarily on proxies instead of the most important results (that are often very hard to measure).