grassfrog

Thursday, September 04, 2014

Follow up on Data Center Issue

The data center issue was resolved early this morning, total down time on our servers was less than 2 hours, and fortunately this downtime was all during the "wee hours" of the morning when traffic is typically the lowest.

Here is the information I received from Hivelocity regarding this outage.

"This morning at roughly 4am EST we recognized a portion of our network was unreachable.  Most of our customers were not impacted but the portion that was were effectively offline.  We isolated the issue to the failure of a layer 3 device.   This device failure caused an uncontrollable layer 2 network loop.  Our network is designed with protected paths but this device failure resulted in a loop pattern that had unexpected results.   The physical ports towards the failed device required manual intervention to restore network services.   Once we took manual action to correct the situation we then went ahead and replaced the failed device.  Most customers began seeing restored connectivity around 5:30am with 100% of our customers being back to normal around 7:45am."

Given their track record for 100% uptime, I am inclined to believe that they have this issue well under control and do not expect we will have any other incidences related to this hardware failure.

Update: Data Center Outage

The data center outage at Hivelocity appears to be affecting several servers is actually a network hardware issue.  Network engineering is working on the issue and we hope to have things back up within a few hours.  I just checked, and their website is now back up, am hoping the rest will follow shortly.

Please check back here for updates regarding this outage, or if you would like to check the status of your website, you can always visit http://www.isup.me to see if your site is down to the world, or simply not accessible to you.

Failure to connect to server, 3:30 AM

Am experiencing a connectivity issue with our host, HiVelocity.net. Trying to get a hold of the data center to see what's going on. The issue appears to be a network one, will update as soon as we find out what's going on.