I wish I could be more specific about this issue and I am really asking for suggestions on where to look next.
We are running a PHP web app that is integrated with WordPress and that is serving a page every second during some parts of the day. Generally things run very well on a single dedicated server (quad core with 16GB RAM).
I am considering New Relic and their tools alerted me to the occasional page load where PHP seems to stall. It reaches a seemingly arbitrary point in the call trace and then spends many seconds (or tens of seconds) on something trivial. The most trivial example was a function that was a conditional followed by echo().
There are no other errors (in the Apache/PHP error log) or slow queries I can see that coincide with these events. They also don’t seem to coincide with any particularly heavy CPU load, disk I/O or network I/O.
It happens a few times an hour, and one thing in common between all the functions that stall is that they are doing output to the page. Could something be blocking the output buffer? Or are there any other obvious culprits that might be causing this issue? What would you do next to troubleshoot?
Linux: CentOS 5.6
APC: 3.1.9 (cache is healthy, hit rate close to 100%)
The PHP FPM SAPI has a slow log, where scripts that take longer than n seconds can be logged with a traceback. Unfortunately no other SAPI has this functionality. (If you were already using nginx+PHP-FPM you would already have this! It’s saved my bacon more than once.)
The fallback seems to be to run xdebug, but this can get hairy in a production environment. Or worse, to roll your own “debugging” scripts (see this question on Stack Overflow for bad examples).
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.