AWS operations explains that it was tougher to restart its S3 index system this time than the last time they tried to restart it.
The Feb. 28 outage of Amazon’s Simple Storage Service says one thing loudly and clearly: Amazon Web Services is growing so fast that it must rely on its automated systems to keep operating. Sometimes things need to go awry in only a minor way, and the sheer scale involved puts those systems at risk of disabling it.
That conclusion is implicit in Amazon’s explanation of the event March 2. Rapid growth seems an inadequate explanation for a S3 freeze up that left Slack, Quora, Mailchimp, Giphy and many other major Web services frozen in time for four hours.
The AWS operations team said that an employee’s data entry error in correcting a bug in its storage billing system lead to a cloud service outage in one region of Its Ashburn, Va., data center complex, U.S. East-1.