![]() |
|
|||||||
| Fastmail.FM Help and Current Issues This forum is for users to help each other to solve any problems they are having using FastMail.FM. It's also the place to discuss problems such as outages, slowness or other similar issues. |
![]() |
| Thread Tools |
|
|
#1 |
|
Junior Member
Join Date: May 2005
Posts: 15
|
Fastmail Down - 1:13am Pst 5/25/2005
Fastmail is currently down and along with it went my business website.
|
|
|
|
|
|
#2 |
|
Essential Contributor
Join Date: Feb 2002
Location: Selangor, Malaysia
Posts: 455
|
Currently facing problems too. It says contacting server and then stops there. Ironically, http://admin.fastem.com is having the same problem.
|
|
|
|
|
|
#3 |
|
The "e" in e-mail
Join Date: Jul 2004
Location: Oslo, Norway
Posts: 2,380
Representative of:
Fastmail.fm |
Our shiny new raid array seems to have taken out the server with the filestorage on it, and everything else is timing out trying to contact it.
We're working on bringing machines back to life now. Bron. |
|
|
|
|
|
#4 |
|
Junior Member
Join Date: May 2005
Posts: 15
|
Is there any ETA for when it will be back up? Should I pause my Google adwords? I hate to pause them because it can cause me to lose rank.
|
|
|
|
|
|
#5 |
|
The "e" in e-mail
Join Date: Jan 2002
Location: The Netherlands
Posts: 4,110
|
Seems to be back up now.
|
|
|
|
|
|
#6 |
|
The "e" in e-mail
Join Date: Jul 2004
Location: Oslo, Norway
Posts: 2,380
Representative of:
Fastmail.fm |
yeah, it's all back up.
I'll post more after I've got the kids in bed. |
|
|
|
|
|
#7 |
|
Master of the @
Join Date: Jul 2002
Location: A.U
Posts: 1,980
|
I do hope they are queueing the mail ? as I'm missing one
|
|
|
|
|
|
#8 |
|
The "e" in e-mail
Join Date: Jul 2004
Location: Oslo, Norway
Posts: 2,380
Representative of:
Fastmail.fm |
I've posted a comment to the status blog and linked to this thread. I promised I'd get into more techincal details here.
We've purchased two new 4TB SATA arrays. Our current storage was filling up as people started using their increased quotas. There's only space in our current cabinets to add one of them, so we connected it to the same machine as the current backups and filestorage, planning to move backups across to the new disk, freeing up the rest of the older SATA unit for the increased filestorage we'll get once DAV/FTP is rolled out. The first problem was that the driver for our SCSI card in linux only supports 2TB disks. We worked around that by splitting the array (something that's very easy to do in its config menu), and also by upgrading the kernel to support LVM to rejoin them into one big disk! The only problem is, the internal RAID array in the box uses a driver which is broken in new Linuxes - we found the patch which claims to fix it again and applied it, but you saw what happened - the internal RAID disappeared from the system and everything went pearshaped! We rebooted the server and it came up fine - then switched back to the last known stable version of the kernel on that box, adding the LVM support back in. I forgot to add SCSI check all LUNs support (I claim holding a crying baby at the time for me mistake), so yet another reboot to add that back in as well. At this point all the disks were back, and the other problem surfaced. The SATA arrays are mounted via NFS from the web servers to provide filestorage access. They started freezing up waiting for responses from the NFS server, which then sent all the other servers into a spiral waiting for services. I saw loads over 1000 on one of our frontend boxes! We restarted either services (if the boxes weren't too overloaded by then) or entire servers until we got everything back under control. The NFS server machine has always been very reliable - though we have had issues with a SCSI cable in the past, and now these problems with having the additional device added. We're really hoping the kernel downgrade has fixed the problem for now, and we have another step available after this (add a separate SCSI card and put the new array on a different channel). We're prefer to separate things more, but we don't have additional space available in the cabinets yet. Our apologies to everyone who was affected by this, Bron. |
|
|
|
|
|
#9 |
|
Member
Join Date: Mar 2005
Location: New York, USA
Posts: 81
|
I have 2 questions:
1. Was there any email loss during this downtime (bounced, deleted, undelivered, etc)? 2. How many developers does FastMail have? It seems like brong is the only one who's fixing bugs and adding features... Maybe you guys need to hire some additional help to speed things up. |
|
|
|
|
|
#10 |
|
The "e" in e-mail
Join Date: Jul 2004
Location: Oslo, Norway
Posts: 2,380
Representative of:
Fastmail.fm |
I'm the only one who's being chatty on the forum anyway!
On the issue of lost email - absolutely not. This is one very nice thing about email (and the Postfix server we use takes great pains to be reliable) - if the receiving host can't guarantee that it has a copy then it doesn't tell the sending host that reception succeeded, and so it will be queued and tried again. We have lost mail in the past for short periods, but in each case it's been due to misconfiguration rather than server failure. Besides, in this case the problem, while serious, affected the NFS infrastructure only, so it took down the web servers and the frontend proxy servers (web servers waiting for disk responses from NFS and frontend servers overloaded queueing requests for the web servers!) The database and mx servers which are the only ones depended on for mail reception were all fine. They wouldn't have even noticed! Bron. |
|
|
|
|
|
#11 |
|
Member
Join Date: Mar 2005
Location: New York, USA
Posts: 81
|
That's good news, Bron. Keep up the good work.
![]() |
|
|
|
![]() |
| Thread Tools | |
|
|