I’m posting this here (as well as the forums on Tucows Discuss) since I know that some of our customers read this blog from time to time.
Here’s what I know — a more official notice should be coming your way shortly.
The actual applications, for the most part, continued to work.
Folks
with Blogware accounts could post to their blogs, Email Defense
continued scanning for spam and viruses, Certs kept working, Email was
queued for a little while and began processing again in the early
evening, and no mail was lost.
What stopped working during the outage: Provisioning and Management.
The
specifics are a little different for each service, but in general, it
was the ability to sell new instances of services or manage existing
ones. Blogware, being located in a different facility, was unaffected.
It’s very unusual for a redundant power system to fail like this.
The
“redundant” in “redundant power system” makes this sort of thing very
unlikely. For example, if you have two systems with a 1% probability of
failing, the probability that both will fail is one one-hundredth of 1%.
The
cause for the failure is as of yet unknown. We’re working with IBM to
identify it. Once we know, we will take steps to prevent this from
happening again and let you know what we will be doing in the future.
3 replies on “Update on the Tucows Service Outages”
May I corret you, because you are 100% off : If you have two systems with each a 1% probability of failing, the probability that both will fail is TWO one-hundredth of 1%.
Stu Savory
http://www.savory.de/blog.htm
Mm, no… First the one must fail (.01) and then, in the event that it does, the other must fail (also .01) so we multiply those chances together and get .0001.
-Devin
assuming the instances are not covariant