grassfrog

Sunday, June 01, 2008

Data Center Update

Here's a quick update on the Data Center outage. At approximately 5pm CDT today (Sunday), the Data Center will begin powering up and testing their network systems, turning on air conditioning systems and monitoring environmental conditions. These tests are projected to run approximately 4 hours, after which time they will begin bringing up banks of servers in phases.

Here is an excerpt from an article on Data Center Knowledge.com, describing this incident more fully:

Explosion at The Planet Causes Major Outage

An electrical explosion and fire Saturday at a Houston data center operated by The Planet has taken the entire facility offline. The explosion at 5 pm Saturday has affected 9,000 customer servers, and the company says it hopes to restore service by late Sunday afternoon. No servers or network equipment were damaged by the explosion, but the data center is without power. The Planet said it is working with the fire department and its facilities staff to restore power and get servers back online.

"(Saturday) evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room," said Doug Erwin, CEO of The Planet, in a message on the company's forums (mirror). "Thankfully, no one was injured. In addition, no customer servers were damaged or lost."

Early indications are that the fire was caused by a short in a high-volume wire conduit. The fire department is not allowing the company to run backup generators, so the facility has been without power since the incident occurred. In the latest update, The Planet says it plans to run a power test at 5 pm Central time. If the test is successful, The Planet will begin bringing customer servers back online.

The explosion affected only the main Houston data center, with no impact on any of The Planet's other five data centers. The company hosts more than 50,000 servers and 22,000 customers in its six data centers, meaning that about a third of its customers and 20 percent of customer servers are currently offline.

This was the second time an explosion and fire has occurred at the Houston data center, which had a transformer explode in June, 2003 when the company was known as Rackshack.

Many of the customers affected are accounts from EV1Servers, the Houston-based dedicated hosting specialist that was acquired by The Planet in 2006 as part of a larger transaction in which private equity firm GI Partners bought both companies. The Houston data center houses the servers for ServerCommand, the management portal for former EV1 customers. "We are in the process of moving the ServerCommand servers to other Houston data centers so that we’re able to loop them into communications," the company said.

The Planet has sought to move accounts to get customers back online, with limited success. "During the early stages of the H1 data center we opportunistically relocated some customers to another data center," wrote Urvish Vashi, the Director of Product Management for The Planet. "However, due to network and data center (power/cooling) constraints, this option is no longer available and requests for migration cannot be honored."

The Planet's main page was knocked offline briefly, according to monitoring from Netcraft, but was back online in less than an hour. The Planet's forums are also online, but are experiencing serious availability problems due to traffic, including a Slashdotting.

"This is a significant outage, impacting approximately 9,000 servers and 7,500 customers," said Erwin. "All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock." Center Networks has noted the strong customer communication during the outage.