Curious Cat Management Improvement Blog: Deming, lean thinking, innovation, customer focus, continual improvement, six sigma.
July 26, 2008

Amazon S3 Failure Analysis

Amazon Simple Storage Service (S3) is a service providing web hosting. The cloud computing solution has been used by many organizations successfully. However the solution has experienced some problems including failing for much of the day on July 20th.

Amazon S3 Availability Event

We’ve now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers’ objects. However, we didn’t have the same protection in place to detect whether this particular internal state information had been corrupted. As a result, when the corruption occurred, we didn’t detect it and it spread throughout the system causing the symptoms described above. We hadn’t encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it.

During our post-mortem analysis we’ve spent quite a bit of time evaluating what happened, how quickly we were able to respond and recover, and what we could do to prevent other unusual circumstances like this from having system-wide impacts. Here are the actions that we’re taking: (a) we’ve deployed several changes to Amazon S3 that significantly reduce the amount of time required to completely restore system-wide state and restart customer request processing; (b) we’ve deployed a change to how Amazon S3 gossips about failed servers that reduces the amount of gossip and helps prevent the behavior we experienced on Sunday; (c) we’ve added additional monitoring and alarming of gossip rates and failures; and, (d) we’re adding checksums to proactively detect corruption of system state messages so we can log any such messages and then reject them.

Finally, we want you to know that we are passionate about providing the best storage service at the best price so that you can spend more time thinking about your business rather than having to focus on building scalable, reliable infrastructure. Though we’re proud of our operational performance in operating Amazon S3 for almost 2.5 years, we know that any downtime is unacceptable and we won’t be satisfied until performance is statistically indistinguishable from perfect.

The failure was significant but in my view the advantages of Amazon S3 are still very significant. A huge advantage is how quickly you can scale if needed be. If your application is not hosted on Amazon S3 and it grows enormously you have to physically deal with buying servers, installing them, installing software… All this takes time. On Amazon S3 when you need the bandwidth you can get it, when you don’t need it you don’t have it sitting around unused. In that way it is very lean, it seems to me.

And while server infrastructure failures are bad, for most organizations the option is not Amazon S3 or some solution that is 100% reliable. Currently it is difficult to keep IT infrastructures online and operating and coping with shifting demand… For many situations Amazon S3 seems to be a great resource. They need to keep improving; and they seem to be doing so. Being open and honest about the challenges is a good sign. And improving the system, not blaming a person is another good sign.

Related: Bezos on the Internet Boom - Amazon’s Amazing Achievement - Bezos on Lean Thinking - CERN Pressure Test Failure - 12 Stocks for 10 Years Update (June 2008), Amazon is up 116% in the portfolio since 2005, just behind Google and ahead of Petro China

April 15, 2008

Management Improvement Carnival #33

Shaun Sayers is hosting Management Improvement Carnival #33 on the Capable blog, some of the highlights include

April 8, 2008

Bezos on Internet Boom

The webcast shows Jeff Bezos, Amazon.com founder and CEO, speaking at TED on the internet boom. He compares the boom to the gold rush highlighting the similarities. But then he compares the internet to the development of industry around electricity. I think he is exactly right on the internet: “there’s more innovation ahead of us than behind us.”

Related: Bezos on Lean Thinking - Amazon Innovation - Amazon’s Amazing Achievement - Innovation Thinking with Christensen - management webcasts

July 24, 2007

Amazon’s Amazing Achievement

I have mentioned I like the way Amazon, and Jeff Bezos, have been managing in several posts. Recently Amazon has added very strong financial results to that portfolio of things they do well. Amazon earnings announcement:

Net sales increased 35% to $2.89 billion in the second quarter, compared with $2.14 billion in second quarter 2006. Excluding the $46 million favorable impact from year-over-year changes in foreign exchange rates throughout the quarter, net sales grew 33% compared with second quarter 2006.

Operating income increased 149% to $116 million in the second quarter, compared with $47 million in second quarter 2006. Net income increased 257% to $78 million in the second quarter, or $0.19 per diluted share, compared with net income of $22 million, or $0.05 per diluted share in second quarter 2006.

Pretty impressive. It seems Amazon might be able to begin delivering strong current financial performance (they have done so at least twice, and maybe longer depending on how you look at it…) and continue to build and innovate for the future. That is when a company really sets itself apart from the crowd. Previously, from the investing perspective, the argument was largely based on the belief that the steps taken today were building for the future (a fine thing, but risky - without the evidence of success actually making real profit it is often easy to make a good case for why the future will be good). In an investment it is more comforting when current earning provide some evidence the profits predicted in the future have some basis in reality.

Since the beginning of April Amazon’s share price has gone from $40 a share to $70. And based on the after hours trades today it is going to be in the $80s tomorrow (though after hours trades can often be misleading - there is some more confidence based on the large volume of hour trades in Amazon, but still…). I must admit this price does seem like it might be getting a bit ahead of itself but Amazon is making an impressive case for strong future performance.

Related: Amazon Innovation - 10 stocks for 10 years (April 2005) - 12 Stocks for 10 Years Update (June 2007) - Very Good Amazon Earnings - Bezos on Lean Thinking - Is Amazon a Bargain?

November 9, 2006

Amazon Innovation

Jeff Bezos’ Risky Bet

And, he hopes, making money. With its Simple Storage Service, or S3, Amazon charges 15 cents per gigabyte per month for businesses to store data and programs on Amazon’s vast array of disk drives. It’s also charging other merchants about 45 cents a square foot per month for real space in its warehouses. Through its Elastic Compute Cloud service, or EC2, it’s renting out computing power, starting at 10 cents an hour for the equivalent of a basic server computer. And it has set up a semi-automated global marketplace for online piecework, such as transcribing snippets of podcasts, called Amazon Mechanical Turk. Amazon takes a 10% commission on those jobs.

In my view Amazon is doing some very interesting innovation. As with most true innovation it is not easy to understand if it will succeed or not. I believe Amazon uses technology very well. They have done many innovative things. They have been less successful at turning their technology into big profits. But I continue to believe they have a good shot at doing so going forward (and their core business is doing very well I think). Innovation often involves taking risks. Bezos is willing to do so and willing to pursue his beliefs even if many question those beliefs. That means he has the potential to truly innovate, and also means he has to potential to fail dramatically.

Related: Bezos on Lean Thinking - Making Changes and Taking Risks - 10 Stocks for 10 Years Update - A9 Toolbar for Firefox Browser
(more…)

July 29, 2005

Bezos on Lean Thinking

Topic: Management Improvement and Investing

10 Questions for Jeff Bezos, time.com via Lean Manufacturing Blog

Time: Here’s a question you probably hear all the time: read any good books lately?
Bezos: [Laughs.] I read a book recently about Toyota’s lean production methodology, which is very interesting

Jeff Bezos is the founder and CEO of Amazon.com. He really understand many quality management ideas: customer focus, long term thinking, process improvement, innovation. He also understands finance much better than most. I believe that knowledge is a large part of the reason he is not intimidated into going along with the short term thinking prevalent on Wall Street (as so many CEO’s are). His huge ownership interest in Amazon and his decision to raise large amounts of cash for Amazon (by issuing bonds) during the tech boom, don’t hurt either.

Amazon was one of the 10 companies selected in the 10 stocks for 10 years post. I created a Marketocracy portfolio to track that long term portfolio. The rules, at the time (for a Marketocracy portfolio), required more diversification so I added several stocks to the portfolio. I added positions in YHOO, MSFT, EMF, WMT, and BP. You can track the results of the Sleep Well portfolio.

You can also view results of another portfolio I have managed, through marketocracy, for several years: the Darvamore Fund. This fund is much more aggressive using the ideas of Darvas and Livermore as well as core positions that are selected for long term appreciation. Since the inception, in 2000, it has a annual rate of return 6.55% (655 basis points) higher than the S&P 500 index, as of today.

Curious Cat Management Improvement Blog © curiouscat.com 2005-2008 powered by WordPress

Internal Links

Author

John Hunter

Tags


Full tag could

Other

Search Blog

Web Search

Management Improvement web search

Recent Comments

  • Anonymous: Insightful. I like this. Will try to see whether the information provided herein is useful or not by the...
  • Tim: Our state passed a law a few years back making it illegal to import drugs from canada, isn’t that a...
  • Shawn: I agree completely. The mainstream press is either lazy, biased, or both, and reports manufacturing job loses...
  • Doug Mikaelian: I couldn’t agree more about “saying no” to individual incentives. But I also...
  • Ian Pratt: Great article and so true, the best management book I have ever read was 15 years old, the surprising...
  • Colleen: I hate that TV program TMZ, but I was thinking they could do some good for a change by harrassing those...
  • shaun sayers: Couldn’t agree more with your view on new management books. However when you have a good long...
  • Anonymous: Another one of Google’s open source projects is Google Chrome. Its one good browser and is where you...

Archives

November 2008
M T W T F S S
« Oct    
 12
3456789
10111213141516
17181920212223
24252627282930