View Single Post
  #15 (permalink)  
Old 02-19-09, 18:38
RichardM RichardM is offline
BOD Member
 
Join Date: Jan 2008
Location: NYC
Posts: 88
Default Only one problem:

Namely, it didn't work.

The setup and migration went well; Alex was there with prompt and knowledgeable support every step of the way; and I tested the setup by shutting down httpd on the master server, and it worked. The slave server stepped up to the plate, and downtime was minimal.

But a few hours ago, the hard drive on the master server failed, the sites went down, and nothing happened. The slave server didn't kick in, and the annoyed phone calls began.

I can access VZ on the slave server, but that's it. Not http, no mail, not even any SSH so I can attempt to restart services. The server responds to a ping, but that's it. Everything is down -- and my clients are getting annoyed.

I understand that hard drives do fail (although I wonder why RAID wasn't used -- this was something I didn't notice when I looked at the specs). But the purpose of this whole mirroring setup was to protect against downtime caused by such failures. As of now, all of my sites have been down for several hours, and I will have to eat the costs for failing to meet my uptime guarantee to my clients.

Worse yet, I have lost a lot of credibility.

I'm disappointed that the first real-world test of this mirroring arrangement has failed. I'd like to know why, and I'd like to know what's being done to address that "why" so it doesn't happen again. I made promises to my clients based on your promises to me, and I don't like breaking my promises.

Looking forward to your response,

Richard
Reply With Quote