By what criteria do you tune timeouts in HA Proxy config?

Jeremy Wadhams asked:

When configuring HA Proxy, how do you decide what values to assign to the timeouts? I’ve read a half dozen samples in various blogs, and everyone uses different timeouts and no one discusses why.

HAProxy seems specifically worried about client, connect, and server, which HAPRoxy throws a warning about if you leave completely unset:

While not properly invalid, you will certainly encounter various problems
with such a configuration. To fix this, please ensure that all following
timeouts are set to a non-zero value: 'client', 'connect', 'server'.

The documentation is unhelpful in this regard: it suggests “slightly above multiples of 3 seconds” but not why you’d choose a multiple of 1 vs 100 or 42.

The RPM I’m using (Amazon Linux repository) sets these defaults:

timeout connect         10s
timeout client          1m
timeout server          1m

Two of which are exact multiples of 3 seconds, violating the only official advice I’ve seen.

If you don’t have specific tuning advice, maybe an easier question is: what should I expect to go wrong with really short or really long timeouts?

My answer:

The TCP RTO (receive timeout) starts at three seconds. (RFC 1122) If a transmitted packet hasn’t had an acknowledgement returned in that time, then it’s assumed to be lost and retransmitted. This is almost certainly what the author is referring to. (Note that the RTO gets tuned up or down dynamically by various algorithms, outside the scope of this question.)

Keep in mind that this really only applies to connections between your frontend server and the clients (i.e. web users). In normal scenarios, the connections between HAProxy and your backend servers should be on a LAN and you should use much shorter timeouts, so that malfunctioning backends get taken out of service sooner.

As for your web users, some of them may be on very high latency connections, such as satellite, and may experience higher than normal retransmits due to this. The RTT on a connection where a satellite is in use may exceed 2000 ms even if all is well.

With all this in mind, you will generally want very short timeouts for timeout connect and very long ones for timeout client.

For timeout server, this depends on your web application. When setting the timeout, consider the complexity of the web app being served, and how long it might take in the worst case to process a complex request. If in doubt, raise the value.

View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.