Why are health checks crucial for high availability?

Health checks are crucial because they prevent downtime. By automatically detecting unresponsive servers, HAProxy can instantly stop sending traffic to them and redirect it to healthy servers, ensuring users never receive an error.

What are the different types of health checks HAProxy offers?

HAProxy offers three main types of health checks:Active: HAProxy periodically initiates its own connection to the server (e.g., a TCP connect or an HTTP request) to test its health.Passive: HAProxy monitors real user connections. If it observes that connections to a server are failing, it marks the server as down.Agent: A small program on the backend server reports detailed internal status (like CPU load) back to HAProxy.

When should I use an agent health check instead of an active or passive one?

You should use an agent health check when a simple external check (like a TCP or HTTP check) isn't enough to know if the application is truly healthy. It's best for monitoring internal server states like CPU load, memory usage, disk space, or database connectivity.

How can I monitor the health status of my servers in real time?

You can monitor server health in real time using the built-in HAProxy Stats Page. It's a web dashboard that provides a live table of your servers and their current status, such as UP or DOWN, and details from the last health check.

A Guide to HAProxy Health Checks for High Availability

HAProxy makes your web applications highly available by spreading requests across a pool of backend servers. If one or even several servers fail, clients can still use your app as long as there are other servers still running.

The caveat is that HAProxy needs to know which servers are healthy. That’s why health checks are crucial. Health checks automatically detect when a server becomes unresponsive or begins to return errors; HAProxy can then temporarily remove that server from the pool until it begins to act normally again. Without health checks, HAProxy has no way of knowing when a server has become dysfunctional.

Note

Health checks complement other fail-safe measures in HAProxy such as retries and redispatches. Read our blog post HAProxy Layer 7 Retries and Chaos Engineering to learn more.

You have access to three types of health checks: active, passive, and agent. Let’s learn about each one.

Active health checks

The simplest solution is to poll your backend servers by attempting to connect at a defined interval. This is known as an active health check. If HAProxy doesn’t get a response back, it determines that the server is unhealthy and after a certain number of failed connections, it removes the server from the rotation.

If you want to keep the default settings, configuring an active health check involves simply adding a check parameter to a server line in a backend. In the following example, we’ve enabled active health checks for each server:

	backend webservers
	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-01.cfg hosted with ❤ by GitHub

HAProxy will try to establish a TCP connection every two seconds. After three failed connections, the server is removed temporarily, until HAProxy gets at least two successful connections, after which it reinstates the server into the backend. You can customize these settings, changing the interval, the number of failed checks that trigger a removal, or the number of successful checks that reinstate the server.

The inter parameter changes the interval between checks; it defaults to two seconds. The fall parameter sets how many failed checks are allowed; it defaults to three. The rise parameter sets how many passing checks there must be before returning a previously failed server to the rotation; it defaults to two. In the example below, we’ve set new values:

server server1 192.168.50.2:80 check inter 10s fall 5 rise 5

view raw blog20210820-02.cfg hosted with ❤ by GitHub

While attempting to connect helps determine whether an application is up and running, it can’t tell you whether the app is behaving normally. For web applications, you can switch to using an HTTP health check instead. An HTTP health check sends an HTTP request and expects a successful response in the 2xx or 3xx range, such as 200 OK or 302 Found.

Just add option httpchk to the backend, as shown:

	backend webservers
	option httpchk
	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-03.cfg hosted with ❤ by GitHub

By default, HAProxy makes a GET request to the URL path /, but you can change that by adding an http-check send line. Below, we send a GET request to the URL path /health. A common technique is to program the /health endpoint to do a thorough check of your application and its dependencies and then return a single successful response if everything looks good.

	backend webservers
	option httpchk
	http-check send meth GET uri /health
	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-04.cfg hosted with ❤ by GitHub

To send a POST request with a JSON body, use this form, which includes a Content-Type request header and a message body:

	backend webservers
	option httpchk
	http-check send meth POST uri /health hdr Content-Type application/json body "{ \"foo\": \"bar\" }"
	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-05.cfg hosted with ❤ by GitHub

While it is a common pattern to have the server do a thorough check on its end, you can also configure HAProxy to perform several checks too. In the example below, we define two checks, both of which must be successful. Each block starts with http-check connect.

	backend webservers
	option httpchk

	http-check connect
	http-check send meth GET uri /health
	http-check expect status 200

	http-check connect
	http-check send meth GET uri /health2
	http-check expect status 200

	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-06.cfg hosted with ❤ by GitHub

The http-check connect directive also lets you connect to the server using SSL and specify the protocol, such as HTTP/2, by using ALPN, as shown below:

http-check connect ssl alpn h2,http/1.1

view raw blog20210820-07.cfg hosted with ❤ by GitHub

Something else that you can do is tell HAProxy to expect a certain status code to be returned or that a string should be included in the HTTP response body. Use the http-check expect directive with either the status or string keyword. In the following example, the application must return a 200 OK response status to be considered healthy:

	backend webservers
	option httpchk
	http-check send meth GET uri /health
	http-check expect status 200
	server server1 192.168.50.2:80 check
	server server2 192.168.50.3:80 check
	server server3 192.168.50.4:80 check

view raw blog20210820-08.cfg hosted with ❤ by GitHub

Or, you can require the response body to contain a case-sensitive string, such as success:

http-check expect string success

view raw blog20210820-09.cfg hosted with ❤ by GitHub

HAProxy also supports other protocol-specific health checks for LDAP, MySQL, PostgreSQL, Redis, and SMTP.

Passive health checks

Whereas an active health check continually polls the server with either a TCP connection or an HTTP request, a passive health check monitors live traffic for errors. You can enable this mode by adding the check, observe, error-limit, and on-error parameters to a server line, as shown below:

	backend webservers
	option httpchk
	http-check send meth GET uri /health
	server server1 192.168.50.2:80 check observe layer7 error-limit 50 on-error mark-down

view raw blog20210820-10.cfg hosted with ❤ by GitHub

Set the observe parameter to layer4 to monitor all TCP connections for problems or to layer7 to watch all HTTP responses for errors. Successful responses are those that have an HTTP status code in the range 100-499, 501 or 505. The error-limit parameter sets how many consecutive requests can have errors before the on-error rule kicks in. Here, the rule marks the server as down.

Passive health checks always coexist with active health checks, with the latter doing its normal polling while also being responsible for reviving a server after it has been marked as down by a passive health check. In other words, you get both types of checking simultaneously. The benefit of that is that you will detect when only a part of your web application is malfunctioning, even if the active health check URL isn’t targeting that part. For example, if active health checks monitor the /health URL, but actual clients are getting errors on the /cart URL, HAProxy will detect that.

Beware that the active health checks will revive the server sooner or later, even if the /cart URL is still malfunctioning. One way to keep an unhealthy server down for longer is to extend the active health check interval by setting the rise parameter higher. Another solution is to turn your passive health check into a full-blown circuit breaker by adding the slowstart parameter, which works well for backend services. We show how to do that in the blog post Circuit Breaking in HAProxy.

Agent health checks

While actively polling servers and observing live traffic are great ways to detect failures, it doesn’t give you a rich sense of a server’s overall state. For example, you can’t easily tell how much CPU load is being placed on it or if it’s running dangerously low on disk space.

With HAProxy, you can communicate with an external agent, which is software running on the server that’s separate from the application being load balanced. Since the agent has full access to the system, it can check the machine’s vitals more closely.

Check the sample project in GitHub to see a working example.

External agents can do more than just respond back with a binary up or down status. They can send signals to HAProxy that update its state, such as:

mark the server as up or down
put the server into maintenance mode
change the amount of traffic flowing to the server
increase or decrease the maximum number of clients that can connect concurrently

The agent will invoke an action when it detects a particular condition on the server. The communication protocol between the agent and HAProxy is simply ASCII text sent over a TCP connection, which makes it easy to write your own external agent program. The agent might send back any of the following (note that the end-of-line character, \n, is required):

Agent sends back	Result
down\n	The server is put into the down state
up\n	The server is put into the up state
maint\n	The server is put into maintenance mode
ready\n	The server is taken out of maintenance mode
50%\n	The server’s weight is halved
maxconn:10\n	The server’ maximum connections is set to 10

On the HAProxy side, add an agent-check parameter to enable communication with the agent program.

	backend webservers
	balance roundrobin
	server server1 192.168.50.2:80 check weight 100 agent-check agent-inter 5s agent-addr 192.168.50.2 agent-port 3000

view raw blog20210820-11.cfg hosted with ❤ by GitHub

There are a few other parameters shown here, so let’s describe them. Use agent-inter to set the interval of the checks. Set the agent-addr and agent-port parameters to the IP address and port where the agent is listening. Using an external agent gives you flexibility in how a server is checked and provides more ways to react. For example, instead of shutting off a server, you might decide to simply dial back the amount of traffic it receives.

The HAProxy Enterprise Real-time Dashboard

When you operate a non-trivial infrastructure, it soon becomes obvious that you need a consolidated view of your system. HAProxy Enterprise has a dashboard called the Real-time Dashboard, where you can observe the current status of all of your services.

Having a central management dashboard makes health monitoring much easier. You can easily filter the list and each server can be enabled and disabled with a button click. You can also apply changes to batches of servers without needing to update each one individually.

A server health check is an automated test HAProxy performs to verify that a backend server is responsive and working correctly. If a server fails a health check, HAProxy temporarily stops sending traffic to it. When the server starts passing checks again, HAProxy automatically adds it back into the rotation.

Conclusion

In this post, you learned how HAProxy provides three types of health checks: active health checks, passive health checks, and agent health checks. Enabling health checks ensures that users aren’t affected by malfunctioning servers.
Learn more about health checks by registering for our webinar: “HAProxy Skills Lab: Health Checking Servers”.

HAProxy Enterprise powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security for your critical services. Organizations harness its cutting-edge features and enterprise suite of add-ons, which are backed by authoritative, expert support and professional services. Ready to learn more? Sign up for a free trial.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.