Note: this post is outdated. Use at your own risk.
I’ve been using nginx for this blog and other sites for well over a year, beginning with Ubuntu 8.10. I have had to figure out some things, but overall I have been very pleased. I have upgraded the server for each Ubuntu release since then with no real problems. Yesterday I upgraded to 10.04 and thought all was well when I went to bed last night. However, at some time during the night all of my sites began to return 504 Gateway Timeout
errors. Hmm.
I did some checking in logs and detective work with top and such and found that my load averages were running between 6 and 8, on a server that has averaged less than 1 for well over a year. After some research, I discovered that the php-fastcgi process was spawning child processes that did not die off when complete. I have no idea why as I did not change any of the nginx, php-fastcgi or other settings. The high load averages dropped to 0 when I stopped the php-fastcgi service.
After some documentation reading and other failed attempts, I finally solved the 504 problem by making one change in my /etc/init.d/php-fastcgi
, adjusting PHP_FCGI_CHILDREN=5
to PHP_FCGI_CHILDREN=2
.
I would really like to figure out why the child processes were not ending properly before and why they are now and better understand what is going on. I’ve also noticed that the responsiveness of the site seems slower, but that could just be my imagination as I have no measurements to confirm/deny. Anyway, if anyone has any ideas, please comment.
For those interested, here are my php-fastcgi, nginx.conf, and fastcgi_params files. Also, this is a 256M slice at Slicehost.
see the last notes on this page, maybe they could be related to your problem:
I came to the solution above by first noticing that it was child processes causing the problem and adjusting the children setting to 1 instead of 5. That solved the high server load problem, but didn’t allow any child processes (and they are quite necessary with this setup). However, setting it to 2 allowed the site to work without running up high averages. I’m still trying to figure out why.
Sigh. The “solution” wasn’t one. It helped, but didn’t solve the problem. I just had to restart the php-fastcgi service as it was taking all the processor power with zombies. If the blog stays up and you are able to post, I’m open to any ideas.
I’m going to change PHP_FCGI_MAX_REQUESTS from 125 to 30 and see if only allowing that few connections before killing the process and restarting it will help. I originally changed this from 1000 to 125 when I first installed nginx on this server and it helped significantly.
EDIT: This seemed good at first, but once 30 requests was reached it killed php-fastcgi and it didn’t respawn. That meant no sites were accessible. Bummer.
Thanks. I’ll read through that and see if it helps.
These parts seemed helpful.
I set the children to 0 and the max_requests back to the default of 1000. It stayed up while I tested a little, but I want to monitor and will probably adjust more throughout the next 24 hours.
EDIT: That worked poorly…whether I set max_requests to 30 or 1000, whenever it gets there it kills the php-fastcgi process. That makes all the sites return a 502 Bad Gateway error. Yeah, I can just restart the process, but I shouldn’t have to do that every hour (or less).
Hmm. Could a mysql bottleneck cause this? Perhaps queries are backing up and taking all of the alloted processes?? Well, I’m out of other ideas, so I’ll look into it. (Honestly, I don’t know if anyone is reading this, but it is handy for me to keep notes.)
I fixed the problem by moving to a new server at a different host. This wasn’t a completely a setup issue. The ultimate problem was that php5 got installed alongside php-fastcgi and they kept fighting over which one would process php. In the end, I was planning a host change and this made it a convenient time, however I did get the old server running properly again.