![]() |
|
|||||||
| FastMail.FM General Discussions Everything that does not belong in the help or feature requests Forums goes here. This includes discussion about FastMail.FM policies, development (such as stylesheet development),FastMail.FM support sites like the Wiki, and so forth. |
![]() |
| Thread Tools |
|
|
#1 |
|
Moderator
Join Date: Aug 2001
Location: USA Northwest
Posts: 3,842
|
I don't see notice that this was scheduled but FM is down and it's been over half an hour.
----------------- ![]() This screen was last refreshed at: Fri Jan 4 22:36:10 2002 The current time is Fri Jan 4 22:42:06 2002 When we last checked, the server was DOWN. The server has been down since Fri Jan 4 21:47:31 2002 -------------------- ![]() |
|
|
|
|
|
#2 |
|
Moderator
Join Date: Nov 2001
Location: Milliways
Posts: 1,165
|
yea...noticed the same...sitting here and waiting..but i got plenty time
![]() |
|
|
|
|
|
#3 |
|
Moderator
Join Date: Nov 2001
Location: Milliways
Posts: 1,165
|
...seems to be working again...
|
|
|
|
|
|
#4 |
|
Junior Member
Join Date: Jan 2002
Posts: 2
|
Also...
I had problems yesterday afternoon sending messages from the web interface and IMAP was hanging in Mozilla. |
|
|
|
|
|
#5 |
|
Junior Member
Join Date: Jan 2002
Posts: 2
|
It's back!
It looks like it is working again!
|
|
|
|
|
|
#6 |
|
Ultimate Contributor
Join Date: Sep 2001
Location: Australia
Posts: 11,499
|
Well, it's been a while since we had an outage more than a few minutes because we've got a few back up systems in place now. So for a substantial problem to happen now requires multiple failures, which is what happened this morning.
The web server was down for about and hour and a half. All IMAP and mail services continued to operate normally. The point of failure was a program we have that provides IMAP connections to the web processes. When this failed a number of logs were created which we're now looking at to determine exactly what problem occured. Normally when there is a problem, our regular testing program that runs every 20 minutes will identify it and attempt to create corrective action, as well as notifying us and logging diagnostic information. The program correctly identified the problem, and attempted to restart the relevent services (in this case, the web service). Unfortunately, after we added the front-end a couple of weeks ago that compresses data, we forgot to include something to restart the front-end when corrective action is taken. So the problem with the communication between the web server and IMAP server was not corrected. The monitoring program hadn't previously needed to restart any services since we added the compressing front-end, so this problem hasn't previously occured. After corrective action is taken, the monitoring program waits 60 seconds and tries again. Normally if the corrective action failed, the 2nd monitoring attempt would fail, and at this point Rob and I get paged. However, the nature of the problem was such that the partial restart of the backend actually resulted in the server working for a couple of minutes correctly, so this 2nd attempt actually succeeded, and we didn't get paged... But of course 20 minutes later when it tried again, it failed again. And again it failed to take corrective action. Each time corrective action is taken Rob and I are sent a warning email, but we didn't see these because it's night-time in Australia. When I got up I saw the warning emails (one every 20 minutes for a 90 minute period) and manually fixed the problem. Anyway, the bit of good news is that when we have a problem like this it gives us a lot of information on how to avoid it next time. Those of you who have been with FastMail.FM for a while have hopefully noticed how it's reliability has been consistently improving. As a result of this latest problem I'm going to make some more changes:
|
|
|
|
|
|
#7 |
|
Moderator
Join Date: Aug 2001
Location: USA Northwest
Posts: 3,842
|
while the sheriff sleeps, where's the deputy?
First off, I'm a happy 'customer' and I've PAID for far worse service than this. Sorry to give you such a rude awakening this morning.
Weren't you going to have some users with the admin acess who could restart things when this was a problem? I went to the admin site to diagnose it and if I'd been an admin it was wanting me to cut loose with the remedy. At least a few key users should be allowed to email your pager and have it wake you. |
|
|
|
|
|
#8 |
|
Ultimate Contributor
Join Date: Sep 2001
Location: Australia
Posts: 11,499
|
Yes, there are 7 users with exactly this (the capability to page me and Rob). But none of them paged me until I was already up (not surprising at this time of year). I really need to get a few more people on board.
Shelded--shoot me an email if you don't mind doing this and I'll send you an admin username and password. |
|
|
|
|
|
#9 | |
|
Senior Member
Join Date: Nov 2001
Location: CT, USA
Posts: 124
|
This gives me a good feeling
Quote:
|
|
|
|
|
|
|
#10 |
|
Junior Member
Join Date: Nov 2001
Posts: 10
|
Indeed! I second that! Very professional Jeremy.
Glad to be an end user, Anthony. |
|
|
|
|
|
#11 |
|
Junior Member
Join Date: Dec 2001
Posts: 3
|
Happy New Year..
![]() I am your happy customer ![]() Keep up a good work ![]() Cheer! |
|
|
|
![]() |
| Thread Tools | |
|
|