Down Time

Many of you may have noticed that our site was down for the majority of yesterday. I’m still not sure what happened, but I was alerted that around 10am, people were getting 500 Internal Server Errors. Somewhere between then and 12pm, people started to get redirected to 127.0.0.1.

I contacted 1and1 (our web hosting provider) and the first tier customer support had no idea what was going on. I knew it was still reaching their servers since pages requiring basic auth still prompted for credentials. But afterwards, it’d still redirect you to 127.0.0.1. They told me they were going to escalate the issue. I inquire what the ETA is to fix and they told me they had no idea. I asked when I should call back if the issue is not resolved. At first they told me 24 hours. Then they say it may take up to 48 hours. I was not pleased with the response, but I guess that’s what you get for cheap hosting.

After 12 hours of down time, I decide to move my site onto DreamHost temporarily. During this move, I made some changes that’ll make my life easier if I decide to port it onto another server. It was a good thing I still had shell access to my 1and1 account. Hurray for rsync for making my life super easy. I was also able to access the databases to get the latest updates since our last backup.

I also took some time to move the blog onto it’s own subdomain. All previous links will still work as they’ll be redirected to http://blog.hd-trailers.net/. If you do see broken links or things not working in general, please do report the issue. Having the blog on a different subdomain served a couple purposes:

  1. It allows us to bypass the WordPress cookie for our main catalog.
  2. It allows us to run each site under a different user.

I apologize for any inconvenience this had cause. Maybe it’s time I get a dedicated server.

Database Issues

Looks like one of our WordPress plugins were generating excessive load on our 1and1 DB. The culprit appears to be Recently Popular, which was doing some sophisticated SQL queries. I’ve since disabled that and switched to WordPress.com Stats Helper, which stores the visitor data on their end.

Let’s hope this resolves the issue and that we’ll be seeing less 500 Internal Server Error. Maybe it’s time to look into dedicated servers or co-location hosting. If so, I’m definitely not going with 1and1 given how unhappy I was on how they dealt with this situation.

I sent the following email to them:

Thanks for clarifying the issue. I’ve disabled the plugin that generates the queries below and hopefully will resolve the issue.

However, there are several questions/expectations I have, which I feel 1and1 could have done a lot better at.

1. The T&C’s are very vague about “excessive load”. Is there a way for me to see what type of load I’m generating and if there is, what defines excessive load? If there isn’t a way, how does a customer even know when he’s generating excessive load?

2. When a customer is putting excessive load on shared resources, I would’ve expected warnings or at minimal a notification that my database was going to be shut down. I found out several hours later my website was no longer able to connect to the database. I then had to call in inquiring about why my database was “closed” and the customer service on the other end was not able to give me a reason because the department in charge of this was closed. I’m extremely unhappy with how this situation was dealt with.

Given this, what’s the next step in restoring the database which you have shut down.

Their reply:

It appeared as if you were using MySQL for logging, so we would suggest not using that. Depending on the severity of the load, we do send out warnings before stopping connectivity, but in your case our System Admins in Germany acted as they did because your database was already affecting our other customers on the server.

I like how their reply hardly answered any of my questions…