Searching HAProxy Enterprise 2.0r1
Sizing recommendations
The operating system and its tuning have a strong impact on the global performance of the load-balancer. Typical CPU usage figures generally show:
15% of the processing time spent in HAProxy versus 85% in the kernel in TCP or HTTP close mode
30% for HAProxy versus 70% for the kernel in HTTP keep-alive mode
Usage can vary depending on whether the focus is on bandwidth, request rate, connection concurrency, or SSL performance. This section aims to provide a few elements to consider as you set up your configuration.
Evaluating the costs of processing requests
It is important to keep in mind that every operation comes with a cost. Hence, each individual operation adds its overhead on top of other operations, which can either be negligible in some circumstances or can dominate in others.
When processing requests from a connection, we observe that:
Forwarding data costs less than parsing request or response headers
Parsing request or response headers cost less than establishing then closing a connection to a server
Establishing and closing a connection costs less than a TLS resume operation
A TLS resume operation costs less than a full TLS handshake with a key computation
An idle connection costs less CPU than a connection whose buffers hold data
A TLS context costs even more memory than a connection with data
So in practice, it is cheaper to process payload bytes than header bytes; thus, it is easier to achieve high network bandwidth with large objects (few requests per volume unit) than with small objects (many requests per volume unit). This explains why maximum bandwidth is always measured with large objects, while request rate or connection rates are measured with small objects.
Some operations scale well on multiple process spread over multiple processors, such as:
The request rate over persistent connections: This does not involve much memory nor network bandwidth and does not require to access locked structures.
TLS key computation: This is completely CPU-bound.
TLS resume (moderately well): This operation reaches its limits around 4 processes, when the overhead of accessing the shared table offsets the small gains expected from more power.
Other operations do not scale as well, such as:
Network bandwidth: The CPU is rarely the bottleneck for large objects.
Connection rate: This is due to a few locks in the system when dealing with the local ports table.
Optimizing performance
The performance values you can expect from a very well tuned system are in these ranges for:
TCP connections per GB of RAM: 29000
TLS frontend connections per GB of RAM: 7900
TLS end to end connections per GB of RAM: 7100
Results from tests with HAProxy Enterprise 2.1r1 kernel 4.19 2x i40g & core i7-8700 are the following:
Note
It is important to take these values as orders of magnitude and to expect significant variations in any direction based on the processor, IRQ setting, memory type, network interface type, operating system tuning, and so on.
Frontend performance**(request cached object, no forward to server)
Frontend protocol
Max requests per seconds
Max bandwidth
HTTPv1.1 connection close
315k
35Gb/s
HTTPv1.1 keepalive
750k
35Gb/s
HTTPSv1.1 keepalive*
670k
34Gb/s
HTTPSv2.0*
740k
29Gb/s
Edge Load Balancer performance** (HTTPSv2.0 frontend protocol*)
Server side protocol | Max requests per seconds | Max bandwidth |
---|---|---|
HTTPv1.1 keepalive | 580k | 21Gb/s |
HTTPSv1.1 keepalive* | 450k | 19Gb/s |
HTTPSv2.0* | 500k | 12Gb/s |
Classic/legacy Load Balancer performance** (HTTPv1.1 keepalive on server side)
Frontend protocol | Max requests per seconds | Max bandwidth |
---|---|---|
HTTPv1.1 connection close | 195k | 33Gb/s |
HTTPv1.1 keepalive | 540k | 33Gb/s |
HTTPSv1.1 keepalive* | 360k | 30Gb/s |
TCP/HTTP Filtering performance**
Frontend protocol | Requests per seconds |
---|---|
HTTPv1.1 connection close | 380k |
HTTPSv1.1 keepalive | 380k |
HTTPSv2.0* | 1350k |
TLS key computation performance
Key type | Key per seconds |
---|---|
RSA 2048 | 9000 |
RSA 4096 | 1500 |
ECDSA 256 | 25000 |
Note
(*) using TLSv1.2 with 2048 bits key and ECDHE-RSA-AES256-GCM-SHA384 cipher
(**) using 1000 conccurent transactions
Guidelines about sizing
There are a few rules of thumb to keep in mind in your sizing exercise:
The request rate is divided by 10 between TLS keep-alive and TLS resume, and between TLS resume and TLS negotiation; while it is only divided by 3 between HTTP keep-alive and HTTP close.
A high frequency core with AES instructions can do around 5 Gbps of AES-GCM per core.
Having more core is rarely helpful (except for TLS), and can even be counter-productive due to the lower frequency. In general, it is better to have a small number of high frequency cores.
On the same server, HAProxy is able to saturate approximately:
5-10 static file servers or caching proxies
100 anti-virus proxies
100-1000 application servers depending on the technology in use