This document is available on the Internet at:  http://urbanmainframe.com/folders/blog/20041208/folders/blog/20041208/

An Expensive Mistake

Date:  8th December, 2004

Tags:

network switch

I built and installed a server for a client about a year ago.

Due to "budget constraints" it wasn't a particularly impressive machine. There was no RAID for example, only a single IDE hard disk. Despite being home to the client's business critical data, there was no redundancy: one NIC, one PSU, one HDD and no UPS...

While the budget didn't provide for a tape drive, I couldn't, in good conscience, leave the server without any means of backup, so I managed to slip a cheap DVD-Writer into the machine. I convinced the client to purchase a pack of re-writable DVDs and made sure that a backup procedure was in place which would use the disks in rotation.

Fast foward about a year. The client calls me up and tells me that, during the course of the previous night, a power failure had occurred and the server wouldn't boot.

When I visited their "computer room" (server on the floor beside a desk in the manager's office) I quickly established that the single hard drive had failed. The client bought a new HDD which I duly installed. I then reinstalled Windows 2000 Server (my nemesis) and popped in the most recent backup DVD, only to find that it was completely blank!

It seems Windows Backup had been churning away every working day for a year, giving the impression that it was safeguarding my client's data, yet not actually writing anything to the DVD (it worked when I installed it). The most recent backup we had was a snapshot backup I took when I installed the server - a year out of date.

I sat down with the client and explained the situation. During the ensuing panic and chaos, I calmly suggested that a disk recovery service might be able to retrieve the data from the dead hard disk. "Look into it," orders the client.

Five minutes with Google introduced me to Vogon International (I guess they're Star Trek fans). I called them, explained the situation and suffered palpitations when they furnished me with a quote. My client, once he'd recovered from the shock of seeing the figure I wrote down, agreed to the recovery and an order was placed. The disk was couriered off to the Vogons and, a couple of days later, was returned with a handful of DVDs upon which my client's data was encoded.

I restored the data (thankfully it was all there) and brought as many services back online as I could. However, there was still a problem. My client runs a heavy-duty document management/CRM application on the server and, while I managed to install the application, I couldn't figure out how to get the data back into it.

My client, fortunately, had invested in a support and maintenance package from the software supplier, so I called them and described our recent near-death experience to them. I asked them to talk me through the procedure for restoring the data, only to be told that I couldn't do it! Their software is built on top of Microsoft's SQL Server, I couldn't just import the data back into the application because databases had to be rebuilt, the file system restored, and all kinds of other technicalities had to be addressed. The only way to bring the system back online, they told me, was to have one of their engineers onsite so that he could rebuild the system. Oh and because the failure was hardware related and thus not a fault of their application, my client would have to pay for the engineer's visit!

Now it's time to look at a few figures. Firstly, let's consider what it would have cost to provide a few safeguards to the server:

  • Sony AIT i50-A/S Tape Drive - £286
  • Tape Media - £100 (?)
  • Second 80GB HDD (for software RAID 1) - £40
  • APC Smart UPS - £351

Total: £777

Compare the above cost with the costs incurred as result of not doing it properly the first time:

  • New HDD - £40
  • Data Recovery - £2,000
  • Software Engineer - £900
  • Sony AIT i50-A/S Tape Drive - £286
  • Tape Media - £100

Total: £3,326

That's not to mention the unquantifiable costs of a week's downtime - and there's the small matter of my invoice too!

Conclusions

Don't ever compromise on the specification of business-critical hardware. Don't ever dismiss the obvious benefits of redundancy and fault-tolerance. Always verify your backups.

Credits

The photograph of the networking hardware was taken by Alex Foley and was sourced via morgueFile.