The End of Theory: The Data Deluge Makes the Scientific Method Obsolete by Chris Anderson
So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.
Speaking at the O’Reilly Emerging Technology Conference this past March, Peter Norvig, Google’s research director, offered an update to George Box’s maxim: “All models are wrong, and increasingly you can succeed without them.”
There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
see update, below. Norvig was misquoted, he agrees with Box’s maxim
I must say I am not at all convinced that a new method without theory ready to supplant the existing scientific method. Now I can’t find peter Norvig’s exact words online (come on Google – organize all the world’s information for me please). If he said that using massive stores of data to make discoveries in new ways radically changing how we can learn and create useful systems, that I believe. I do enjoy the idea of trying radical new ways of viewing what is possible.
Practice Makes Perfect: How Billions of Examples Lead to Better Models (summary of his talk on the conference web site):
Related: Will the Data Deluge Makes the Scientific Method Obsolete? – Pragmatism and Management Knowledge – Data Based Decision Making at Google – Seeing Patterns Where None Exists – Manage what you can’t measure – Data Based Blathering – Understanding Data – Webcast on Google Innovation
The Google Way of Science by Kevin Kelly
One example is Google’s translation function, which allows Chinese texts to be translated into English, for example. In Chinese, multiple symbols that mean something on their own can also be combined to create a single word. Google segments Chinese texts by comparing a large amount of Chinese and English versions of the same content to increase the probability that the Chinese characters will match the English words.
www.google.com/corporate, first sentence, September 2008:
Update: Actually, Peter Norvig has posted a correction, he did not say what was quoted:
Is it true that at your ETech presentation in March, you said, in a direct homage to George Box, “All models are wrong, and you don’t need them anyway”? Is that accurate?
Great, I thought–Wired is a publication with integrity and wants to get the facts right. I wrote back:
The quote I used was “essentially all models are wrong, but some are useful”.
The point I was making — and I don’t remember the exact words — was that if the model is going to be wrong anyway, why not see if you can get the computer to quickly learn a model from the data, rather than have a human laboriously derive a model from a lot of thought.
I figured they would either use the quote I gave them, paraphrase it, or drop it completely if it didn’t fit with the point of the story. But when Chris Anderson’s story The End of Theory: The Data Deluge Makes the Scientific Method Obsolete came out in June 2008, there was a fourth possibility that I hadn’t even counted upon: they attributed to me a made-up quote that actually contradicts the reply I gave them:
Peter Norvig, Google’s research director, offered an update to George Box’s maxim: “All models are wrong, and increasingly you can succeed without them.”
To set the record straight: That’s a silly statement, I didn’t say it, and I disagree with it.
The ironic thing is that even the article’s author, Chris Anderson, doesn’t believe the idea. I saw him later that summer at Google and asked him about the article, and he said “I was going for a reaction.” That is, he was being provocative, presenting a caricature of an idea, even though he knew the idea was not really true. That’s a mode I expect from other publications, but it’s not what I want from Wired, and I don’t expect Wired to make up facts to support their caricature.