Synopsis
One of HAProxy strength is that it is flexible and allows to redirect traffic based on events and internal status.
In the current article, I’ll show how HAProxy can be useful to handle traffic when worst cases happen.
By worst case I mean the moment when something went wrong in your architecture and your application because partially or totally unavailable.
Cases
Backup server
When all servers in a farm are down, we want to redirect traffic to a backup server which delivers either sorry pages or a degraded mode of the application.
This can be done easily in HAProxy by adding the keyword backup on the server line. If multiple backup servers are configured, only the first active one is used.
Below, the HAProxy configuration corresponding to this case:
frontent ft_app bind 10.0.0.1:80 default_backend bk_app_main backend bk_app_main server s1 10.0.0.101:80 check server s2 10.0.0.102:80 check server s3 10.0.0.103:80 check backup server s4 10.0.0.104:80 check backup
In this case, s3 will be used first, until it fails, then s4 will be used.
Multiple backup servers
In some cases, when the farm takes a huge traffic, we may want to use many backup servers at a time. This can be achieved by enabling the option allbackups in HAProxy configuration.
Below, the HAProxy configuration corresponding to this case:
frontent ft_app bind 10.0.0.1:80 default_backend bk_app_main backend bk_app_main option allbackups server s1 10.0.0.101:80 check server s2 10.0.0.102:80 check server s3 10.0.0.103:80 check backup server s4 10.0.0.104:80 check backup
In this case, both s3 and s4 will be used if they are available.
Farm failover
Despite the case above improves a bit our failover scenario, it has some weaknesses. For example we must wait until all the production servers are DOWN before using the backup servers.
HAProxy can failover traffic to a backup farm when the main one has not enough capacity or, worst case, no capacity anymore.
Below, the HAProxy configuration corresponding to this case:
frontent ft_app bind 10.0.0.1:80 # detect capacity issues in production farm acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2 # failover traffic to backup farm use_backend bk_app_backup if MAIN_not_enough_capacity default_backend bk_app_main backend bk_app_main server s11 10.0.0.101:80 check server s12 10.0.0.102:80 check server s13 10.0.0.103:80 check server s14 10.0.0.104:80 check backend bk_app_backup server s21 20.0.0.101:80 check server s22 20.0.0.102:80 check
Farm failover with backup servers
Of course, we could combine all the options above.
First we want to failover to a backup farm if the production one has not enough capacity, second, we want to use 2 backup servers when all the production servers from the backup farm are DOWN.
Below, the HAProxy configuration corresponding to this case:
frontent ft_app bind 10.0.0.1:80 # detect capacity issues in production farm acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2 # failover traffic to backup farm use_backend bk_app_backup if MAIN_not_enough_capacity default_backend bk_app_main backend bk_app_main server s11 10.0.0.101:80 check server s12 10.0.0.102:80 check server s13 10.0.0.103:80 check server s14 10.0.0.104:80 check backend bk_app_backup option allbackups server s21 20.0.0.101:80 check server s22 20.0.0.102:80 check server s23 20.0.0.103:80 check backup server s24 20.0.0.104:80 check backup
Worst case: no servers available anymore
Well, imagine you plugged all your servers on a single switch, HAProxy box has 2 interfaces, one on the public switch, on on the server switch. Of course, this is not how you plugged your servers, don’t you?
Imagine the server switch fails, then no servers are available anymore. Then HAProxy can be used to deliver sorry pages for you.
Below, the HAProxy configuration corresponding to this case:
frontent ft_app bind 10.0.0.1:80 # sorry page to return when worst case happens errorfile 503 /etc/haproxy/errorfiles/sorry.http # detect capacity issues in production farm acl MAIN_not_enough_capacity nb_srv(bk_app_main) le 2 # failover traffic to backup farm use_backend bk_app_backup if MAIN_not_enough_capacity default_backend bk_app_main backend bk_app_main server s11 10.0.0.101:80 check server s12 10.0.0.102:80 check server s13 10.0.0.103:80 check server s14 10.0.0.104:80 check backend bk_app_backup option allbackups server s21 20.0.0.101:80 check server s22 20.0.0.102:80 check server s23 20.0.0.103:80 check backup server s24 20.0.0.104:80 check backup
And below, the content of the sorry.http page:
HTTP/1.0 200 OK Cache-Control: no-cache Connection: close Content-Type: text/html <html> <body> <h1>Sorry page</h1> Sorry, we're under maintenance </body> </html>
Important notes
Health checking
Health checking must be enabled on the servers. Without health checking, HAProxy can’t know the server status and then can’t decide to failover traffic.
Persistence
If a persistence information points to one backup server, then HAProxy will keep on using it, even if production servers are available.
http://d.pr/i/8MIL
200 Ok ? Wouldn’t 503 be more appropriate ?
rfc2616 describes 503 as “The server is currently unable to handle the request due to a temporary overloading or maintenance of the server”
Hi,
503 could be interpreted by browsers and they could display customer pages which won’t contain the message you want to deliver.
Baptiste
Browsers may not display the error page if the content is less than 1Kb of html. Otherwise everything is displayed like normal pages.
I never experienced this behavior!
Baptiste
Hi Baptiste,
The problem with 200 is that the page could be cached by a proxy or indexed by crawlers. I think we should stick to the http rules and have a correct 503 response when the service is really unavailable, even if browsers have a bad behavior. If you really don’t want to use a 5XX error pages, maybe something like a 307 to a 200 “sorry” page could do the job.
Guillaume
if our backup node fails…..then why our whole system goes down
#################
i have used allbackup option
allbackup
s1 ip check
s2 ip check
s3 ip check backup