HAProxy provides active, passive, and agent health checks.
HAProxy makes your web applications highly available by spreading requests across a pool of backend servers. If one or even several servers fail, clients can still use your app as long as there are other servers still running.
The caveat is, HAProxy needs to know which servers are healthy. That’s why health checks are crucial. Health checks automatically detect when a server becomes unresponsive or begins to return errors; HAProxy can then temporarily remove that server from the pool until it begins to act normally again. Without health checks, HAProxy has no way of knowing when a server has become dysfunctional.
Note: Health checks complement other fail-safe measures in HAProxy such as retries and redispatches. Read our blog post HAProxy Layer 7 Retries and Chaos Engineering to learn more.
You have access to three types of health checks: active, passive, and agent. Let’s learn about each one.
Active Health Checks
The simplest solution is to poll your backend servers by attempting to connect at a defined interval. This is known as an active health check. If HAProxy doesn’t get a response back, it determines that the server is unhealthy and after a certain number of failed connections, it removes the server from the rotation.
If you want to keep the default settings, configuring an active health check involves simply adding a
check parameter to a
server line in a backend. In the following example, we’ve enabled active health checks for each server:
HAProxy will try to establish a TCP connection every two seconds. After three failed connections, the server is removed, temporarily, until HAProxy gets at least two successful connections, after which it reinstates the server into the backend. You can customize these settings, changing the interval, number of failed checks that trigger a removal, or the number of successful checks that reinstate the server.
inter parameter changes the interval between checks; it defaults to two seconds. The
fall parameter sets how many failed checks are allowed; it defaults to three. The
rise parameter sets how many passing checks there must be before returning a previously failed server to the rotation; it defaults to two. In the example below, we’ve set new values:
While attempting to connect helps determine whether an application is up and running, it can’t tell you whether the app is behaving normally. For web applications, you can switch to using an HTTP health check instead. An HTTP health check sends an HTTP request and expects a successful response in the 2xx or 3xx range, such as 200 OK or 302 Found.
option httpchk to the backend, as shown:
By default, HAProxy makes a GET request to the URL path /, but you can change that by adding an
http-check send line. Below, we send a GET request to the URL path /health. A common technique is to program the /health endpoint to do a thorough check of your application and its dependencies and then return a single successful response if everything looks good.
To send a POST request with a JSON body, use this form, which includes a Content-Type request header and a message body:
While it is a common pattern to have the server do a thorough check on its end, you can also configure HAProxy to perform several checks too. In the example below, we define two checks, both of which must be successful. Each block starts with
http-check connect directive also lets you connect to the server using SSL and specify the protocol, such as HTTP/2, by using ALPN, as shown below:
Something else that you can do is tell HAProxy to expect a certain status code to be returned or that a string should be included in the HTTP response body. Use the
http-check expect directive with either the status or string keyword. In the following example, the application must return a 200 OK response status to be considered healthy:
Or, you can require the response body to contain a case-sensitive string, such as success:
HAProxy also supports other protocol-specific health checks for LDAP, MySQL, PostgreSQL, Redis, and SMTP.
Passive Health Checks
Whereas an active health check continually polls the server with either a TCP connection or an HTTP request, a passive health check monitors live traffic for errors. You can enable this mode by adding the
on-error parameters to a server line, as shown below:
observe parameter to layer4 to monitor all TCP connections for problems or to layer7 to watch all HTTP responses for errors. Successful responses are those that have an HTTP status code in the range 100-499, 501 or 505. The
error-limit parameter sets how many consecutive requests can have errors before the
on-error rule kicks in. Here, the rule marks the server as down.
Passive health checks always coexist with active health checks, with the latter doing its normal polling while also being responsible for reviving a server after it has been marked as down by a passive health check. In other words, you get both types of checking simultaneously. The benefit of that is that you will detect when only a part of your web application is malfunctioning, even if the active health check URL isn’t targeting that part. For example, if active health checks monitor the /health URL, but actual clients are getting errors on the /cart URL, HAProxy will detect that.
Beware that the active health checks will revive the server sooner or later, even if the /cart URL is still malfunctioning. One way to keep an unhealthy server down for longer is to extend the active health check interval by setting the
rise parameter higher. Another solution is to turn your passive health check into a full blown circuit breaker by adding the
slowstart parameter, which works well for backend services. We show how to do that in the blog post Circuit Breaking in HAProxy.
Agent Health Checks
While actively polling servers and observing live traffic are great ways to detect failures, it doesn’t give you a rich sense of a server’s overall state. For example, you can’t easily tell how much CPU load is being placed on it or if it’s running dangerously low on disk space.
With HAProxy, you can communicate with an external agent, which is software running on the server that’s separate from the application being load balanced. Since the agent has full access to the system, it can check the machine’s vitals more closely.
Check the sample project in GitHub to see a working example.
External agents can do more than just respond back with a binary up or down status. They can send signals to HAProxy that update its state, such as:
- mark the server as up or down
- put the server into maintenance mode
- change the amount of traffic flowing to the server
- increase or decrease the maximum number of clients that can connect concurrently
The agent will invoke an action when it detects a particular condition on the server. The communication protocol between the agent and HAProxy is simply ASCII text sent over a TCP connection, which makes it easy to write your own external agent program. The agent might send back any of the following (note that the end-of-line character, \n, is required):
|Agent sends back||Result|
|down\n||The server is put into the down state|
|up\n||The server is put into the up state|
|maint\n||The server is put into maintenance mode|
|ready\n||The server is taken out of maintenance mode|
|50%\n||The server’s weight is halved|
|maxconn:10\n||The server’ maximum connections is set to 10|
On the HAProxy side, add an
agent-check parameter to enable communication with the agent program.
There are a few other parameters shown here, so let’s describe them. Use
agent-inter to set the interval of the checks. Set the
agent-port parameters to the IP address and port where the agent is listening. Using an external agent gives you flexibility in how a server is checked and provides more ways to react. For example, instead of shutting off a server, you might decide to simply dial back the amount of traffic it receives.
The HAProxy Enterprise Real-time Dashboard
When you operate a non-trivial infrastructure, it soon becomes obvious that you need a consolidated view of your system. HAProxy Enterprise has a dashboard, called the Real-time Dashboard, where you can observe the current status of all of your services.
Having a central management dashboard makes health monitoring much easier. You can easily filter the list and each server can be enabled and disabled with a button click. You can also apply changes to batches of servers without needing to update each one individually.
In this post, you learned how HAProxy provides three types of health checks: active health checks, passive health checks, and agent health checks. Enabling health checks ensures that users aren’t affected by malfunctioning servers.
Learn more about health checks by registering for our webinar: « HAProxy Skills Lab: Health Checking Servers ».
HAProxy Enterprise powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security for your critical services. Organizations harness its cutting edge features and enterprise suite of add-ons, which are backed by authoritative, expert support and professional services. Ready to learn more? Sign up for a free trial.