HAProxy Enterprise Documentation 2.1r1

Circuit Breaking

A circuit breaker is a mechanism that monitors services in real time, checking for errors in the service's responses. If failures exceed a threshold, the circuit breaker flips into the open state and shuts off access to the service. Its purpose is to detect error conditions that may last a long time and rather than allowing dependent services to continue calling the faulty service, it sends back an error immediately. This prevent them from trying to use the service for a period of time.

You should not enable circuit breaking between your application and end users, since this could lead to a bad user experience. Use it between backend services, such as between proxied microservices.

Circuit breaker using the observe keyword

A simple implementation of the circuit breaker pattern involves using the observe keyword to monitor live traffic for errors. Consider the following example, which will disable access to a server if it detects at least 50 recent HTTP errors:

backend serviceA
   default-server maxconn 30  check  observe layer7  error-limit 50  on-error mark-down  inter 1s  rise 30  slowstart 20s
   server s1 192.168.0.10:80
   server s2 192.168.0.11:80

How it works:

  • The default-server directive sets parameters that apply to all server lines in the backend section;

  • The check parameter enables health checking of the server;

  • The observe layer7 parameter enables monitoring of traffic coming and going from the server;

  • The error-limit 50 parameter sets a threshold of 50 errors, after which it triggers the on-error action;

  • The on-error mark-down parameter marks the service as DOWN if the error-limit is reached;

  • The inter 1s sets how often to send active health checks (1 second), which are responsible for checking a service after it has failed to know when to bring it back online;

  • The rise 30 parameter sets how many successful active health checks there must be (30) before bringing the server back online; When you multiply the inter value by the rise value, you get the minimum amount of time that the server will be removed from the load balancing rotation (1 second x 30 = 30 seconds);

  • The slowstart 20s parameter sends traffic to the server gradually over 20 seconds after it has recovered until it reaches 100% of its maximum connections, as set by maxconn.

You may also set observe to layer4 if you prefer to monitor for unsuccessful connections to a server rather than failed HTTP responses.

Circuit breaker using stick tables

In this more complex example, HAProxy Enterprise monitors the number of HTTP 5xx errors returned from all servers in the backend. If that number makes up 50% of all responses, it disables access to the service by rejecting all new requests for the next 30 seconds.

backend serviceA
   stick-table type string  size 1  expire 30s  store http_req_rate(10s),gpc0,gpc0_rate(10s),gpc1

   # Is the circuit broken?
   acl circuit_open be_name,table_gpc1 gt 0

   # Reject request if circuit is broken
   http-request deny deny_status 503 if circuit_open

   # Begin tracking requests
   http-request track-sc0 be_name

   # Count HTTP 5xx server errors
   http-response sc-inc-gpc0(0) if { status ge 500 }

   # Store the HTTP request rate and error rate in variables
   http-response set-var(res.req_rate) sc_http_req_rate(0)
   http-response set-var(res.err_rate) sc_gpc0_rate(0)

   # Check if error rate is greater than 50% using some math
   http-response sc-inc-gpc1(0) if { int(100),mul(res.err_rate),div(res.req_rate) gt 50 }

   server s1 192.168.0.10:80 check
   server s2 192.168.0.11:80 check

How it works:

  • The stick-table line tracks requests entering the backend. It monitors the HTTP request rate, the HTTP error rate (captured with the generic counters named gpc0 and gpc0_rate), and a counter that acts as a flag that opens the circuit (gpc1) when the error percentage exceeds a threshold. The expire parameter sets how long to disable access to the service once the gpc1 flag has been incremented. In this example, the period to disable the service when it becomes faulty is 30 seconds.

  • The circuit_open ACL checks whether the flag gpc1 is 0 or 1. If it is 1, the circuit is open.

  • The http-request deny line rejects all requests while the circuit is open, returning an HTTP 503 - Service Unavailable response in the meantime.

  • The http-request track-sc0 line ensures that all requests entering the backend are monitored for errors.

  • The http-response sc-in-gpc0(0) line increments the error counter (gpc0) every time a server returns an HTTP 5xx response (i.e. any HTTP error in the 500-599 range).

  • The http-response set-var lines set two variables. The first is res.req_rate, which holds the current HTTP request rate. The second is res.err_rate, which holds the current HTTP error rate.

  • The http-response sc-inc-gpc1(0) line increments the gpc1 flag to 1 if the error rate makes up at least 50% of the request rate. This opens the circuit. The circuit is left open and no requests are allowed into the backend until the record expires in the stick table after 30 seconds.

Adjust the error rate threshold on the http-response sc-inc-gpc1(0) line to a number other than 50. Or, adjust the time period that the circuit stays open by changing the expire parameter on the stick table.


Next up

Device Detection