Thursday, September 04, 2014

Follow up on Data Center Issue

The data center issue was resolved early this morning, total down time on our servers was less than 2 hours, and fortunately this downtime was all during the "wee hours" of the morning when traffic is typically the lowest.

Here is the information I received from Hivelocity regarding this outage.

"This morning at roughly 4am EST we recognized a portion of our network was unreachable.  Most of our customers were not impacted but the portion that was were effectively offline.  We isolated the issue to the failure of a layer 3 device.   This device failure caused an uncontrollable layer 2 network loop.  Our network is designed with protected paths but this device failure resulted in a loop pattern that had unexpected results.   The physical ports towards the failed device required manual intervention to restore network services.   Once we took manual action to correct the situation we then went ahead and replaced the failed device.  Most customers began seeing restored connectivity around 5:30am with 100% of our customers being back to normal around 7:45am."

Given their track record for 100% uptime, I am inclined to believe that they have this issue well under control and do not expect we will have any other incidences related to this hardware failure.