Using ALOHA load balancer and HAProxy, it is easy to protect any application or web server against unexpected high load.
Introduction
The response time of web servers is directly related to the number of requests they have to manage at the same time. And the response time is not linearly linked to the number of requests, it looks like exponential.
The graph below shows a server response time compared to the number of simultaneous users browsing the website:
Simultaneous connections limiting
Simultaneous connections limiting is basically a number (aka the limit) a load balancer will consider as the maximum number of requests to send to a backend server at the same time.
Of course, since HAProxy has such a function, ALOHA load-balancer does.
Smart handling of requests peak with HAProxy
The meaning is too prevent too many requests to be forwarded to an application server, by adding a limit for simultaneous requests for each server of the backend.
Fortunately, HAProxy would not reject any request over the limit, unlike some other load balancer does.
HAProxy use a queueing system and will wait for the backend server to be able to answer. This mechanism will add slow delays to request in the queue, but it has a few advantages :
- no client request are rejected
- every request can be faster served than with an overloaded backend server
- the delay is still acceptable (a few ms in queue)
- your server won’t crash because of the spike
simultaneous requests limiting occurs on the server side: HAProxy will limit the number of concurrent request to the server despite what happens on the client side.
HAProxy will never refuse any client connection until the underlying server runs out of capacity.
Concrete numbers
If you read carefully the graph above, you can easily see that the more your server has to process requests at the same time, the longer each request will take to process.
The table below summarize the time spent by our example server to process 250 requests with different simultaneous requests limiting value:
Number of requests | Simultaneous requests limit | Average time per request | Longuest response time in ms |
---|---|---|---|
250 | 10 | 9 | 225 |
250 | 20 | 9 | 112 |
250 | 30 | 9 | 75 |
250 | 50 | 25 | 125 |
250 | 100 | 100 | 250 |
250 | 150 | 225 | 305 |
250 | 250 | 625 | 625 |
It’s up to the website owner to know what will be the best limit to setup on HAProxy.
You can approximate it by using HTTP benchmark tools and by comparing average response time to constant number of request you send to your backend server.
From the example above, we can see we would get the best of this backend server by setting up the limit to 30.
Setting up a limit too low would implies queueing request for a longer time and setting it too high would be counter-productive by slowing down each request because of server capacity.
HAProxy simultaneous requests limiting configuration
The simultaneous requests limiting configuration is made with the maxconn keyword on the server line definition.
Example:
frontend APPLI1 bind :80 mode http option http-server-close default_backend APPLI1 backend APPLI1 balance roundrobin mode http server server1 srv1:80 maxconn 30 server server2 srv2:80 maxconn 30
Thanks for the post!
Do you have some example of how you can test the backend server to get the good value?
I usually inject the same client traffic pattern and observe application behavior, response time with different values.
Baptiste
Which tools tools do you recommend?
ab, httperf, httpress, inject, etc…
There are many tools available for this purpose.
Baptiste
Thanks for the post!
Could you give some example on how to test the backend server to get the best maxconn?
It seems that selecting a good value for maxconns this be automated by log analysis
https://github.com/gforcada/haproxy_log_analysis/issues/6
Great post! Can you specify the PC you tested these values.
I am interested in how cpu core number affects best maxconn value.
The number of CPU cores is irrelevant here. However if you run with multiple processes in the backend, you’ll have to divide your maxconn setting by the number of processes, which is not much convenient. That’s one of the reasons we often recommend not to use multiple processes for backends.