Medium traffic causes wordpress server to require a hard reboot

Arth asked:

We are having trouble with our rackspace wordpress server falling down with medium traffic after an email send.

The server specs are:

CPU            2 vCPUs
RAM            2 GB
System Disk   80 GB
Network      240 Mb / s
Disk I/O    Good

Running:

Centos       7.0
Wordpress  4.3.1
Httpd      2.4.6
PHP       5.4.11
MariaDB   5.5.41

The installation is all fairly standard as far as I can tell and the database is pretty standard, indexed and fairly small. We are also wordpress object caching.

According to New Relic; during normal traffic, the site spends about 80% of the time in PHP, 15% of the time in web external and only a small percentage in the database. Average standard page app time is around 800ms, which does seem slow to me.

Running a load test of 250 connections in 1 minute causes the connections to take progressively longer and then start timing out after about 30, and the server to become unresponsive (even when traffic dies back down). It requires a hard reboot to become active again.

I can’t connect using putty and the home page oscillates between timing out and returning the dreaded ‘Error Establishing Database Connection’.

Using the rackspace monitoring agent on the most recent test it appears that the CPU is maxing at 100% just before death, the memory used is peaking at about 1.6GB with free dropping to about 100MB. It looks like about 2GB of Swap Memory (total 4GB) is being used too. Standard usage appears to be about 15% CPU, 800MB memory and 400MB swap.

Our Apache config doesn’t set any of the following (no files in /etc do); Timeout, KeepAlive, MaxKeepAliveRequests, KeepAliveTimeout; so I’m guessing it is using the default values.

I’ve looked at mariadb settings:

innodb_buffer_pool_size = 1400M
max_user_connections = 0

Which don’t seem to be the cause.

I’ve also turned on the performance_schema, but I don’t really know what I’m looking for. I’m not even sure the DB is the problem.

I’m tempted to upgrade the instance, but I’d rather have a clearer view of where the bottleneck is and what is causing the server to die rather than just slow down.

Any ideas on where to start? There seem to be lots of possible tweaks out there and a lot of information.

My answer:


Close monitoring during any sort of event is crucial. As we see, the truth came out:

Using the rackspace monitoring agent on the most recent test it appears that the CPU is maxing at 100% just before death, the memory used is peaking at about 1.6GB with free dropping to about 100MB. It looks like about 2GB of Swap Memory (total 4GB) is being used too. Standard usage appears to be about 15% CPU, 800MB memory and 400MB swap.

PHP is well known to be rather CPU intensive. You’ve used all of the available CPU and nearly all of the available RAM.

You should first take steps to deal with that, such as opcode caching (e.g. Zend OPcache) and file caching (e.g. W3 Total Cache WordPress plugin). If those don’t help enough, then it’s time to upgrade the instance.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.