HAProxy is a Load-Balancer, this is a fact. It is used to route traffic to servers to primarily ensure applications reliability.
Most of the time, the sessions are locally stored in a server. Which means that if you want to split client traffic on multiple servers, you have to ensure each user can be redirected to the server which manages his session (if the server is available, of course). HAProxy can do this in many ways: we call it persistence.
Thanks to persistence, we usually says that any application can be load-balanced… Which is true in 99% of the cases. In very rare cases, the application can’t be load-balanced. I mean that there might be a lock somewhere in the code or for some other good reasons…
In such case, to ensure high-availability, we build “active/passive” clusters, where a node can be active at a time.
HAProxy can be use in different ways to emulate an active/passive clustering mode, and this is the purpose of today’s article.
Bear in mind that by “active/passive”, I mean that 100% of the users must be forwarded to the same server. And if a fail over occurs, they must follow it in the mean time!
Let’s use one HAProxy with a couple of servers, s1 and s2.
When starting up, s1 is master and s2 is used as backup:
------------- | HAProxy | ------------- | ` |active ` backup | ` ------ ------ | s1 | | s2 | ------ ------
Automatic failover and failback
The configuration below makes HAProxy to use s1 when available, otherwise fail over to s2 if available:
defaults mode http option http-server-close timeout client 20s timeout server 20s timeout connect 4s frontend ft_app bind 10.0.0.100:80 name app default_backend bk_app backend bk_app server s1 10.0.0.1:80 check server s2 10.0.0.2:80 check backup
The most important keyword above is “backup” on s2 configuration line.
Unfortunately, as soon as s1 comes back, then all the traffic will fail back to it again, which can be acceptable for web applications, but not for active/passive
Automatic failover without failback
The configuration below makes HAProxy to use s1 when available, otherwise fail over to s2 if available.
When a failover has occured, no failback will be processed automatically, thanks to the stick table:
peers LB peer LB1 10.0.0.98:1234 peer LB2 10.0.0.99:1234 defaults mode http option http-server-close timeout client 20s timeout server 20s timeout connect 4s frontend ft_app bind 10.0.0.100:80 name app default_backend bk_app backend bk_app stick-table type ip size 2 nopurge peers LB stick on dst server s1 10.0.0.1:80 check server s2 10.0.0.2:80 check backup
The stick table will maintain persistence based on destination IP address (10.0.0.100 in this case):
show table bk_app # table: bk_app, type: ip, size:20480, used:1 0x869154: key=10.0.0.100 use=0 exp=0 server_id=1
With such configuration, you can trigger a fail back by disabling s2 during a few second period.
just a few comments :
– you don’t need 20k entries since you stick on “dst”
– you can even stick on “always_true” which has only one value, and have a single entry in your stick-table.
You might not need 20k entries, but in in a lot of scenarios your backend server will be configured for multiple IP-adresses (e.g. in case of SSL certificates). So I would define the stick table with lets say at least 100 entries.
The dst IP which is stored is the IP from the frontend.
This is a ‘dst’ from a client connection point of view.
So definitely, the table size should be 1.
If you have 100 IPs on your frontend, then this article won’t help you. I should write a new one for such scenario!
Isn’t it simpler to use a very high rise instead of peers? E.g.:
server s1 10.0.0.1:80 check rise 9999999
Your solution does not answer the issue at all…
Furthermore, your server will be back up after 115 days (if inter is 1s).
I appreciate posts like these and your other blogs, very informative. Apparently I missed your point as I was under the assumption you needed a mechanism to prevent automatic fail back in a active/passive setting. I agree that adding ‘rise 999999’ does fallback to the master after over 100 days, but well, it gives you plenty time for a manual fail back… Source: http://serverfault.com/questions/220681/prevent-haproxy-from-toggeling-back-from-fallback-to-master
Don’t trust all the thing you can read on Internet!
Thanks to HAProxy agility, a single goal can be achieved by many ways.
That said, in a production environment, the only reliable way is, from my point of view, the one in the blog.
> With such configuration, you can trigger a fail back by disabling s2 during a few second period.
I have thought a bit more about your solution, and I find a ‘rise 9999999’ safer than using a ‘stick on dst’ table. This is because I never never want to fail back automatically. I have to be sure that stuff like db replication is recovered first. I’m afraid that in a temporary unstable netwerk, the backup server will be flagged as down and hence a fail back commences…
stick on dst already avoids automatic failback! This is why we use this solution!
Don’t use the nopurge option, it blocks table writing.
I don’t understand the table size, refer to http://thread.gmane.org/gmane.comp.web.haproxy/21038 and https://bugzilla.redhat.com/show_bug.cgi?id=1211781, it set size as 1000.
Any details about table size ?
I feel like I missed something here… When I implement this configuration on a simple 2-nodes haproxy solution:
– the table gets populated after the first request
# table: bk_ldap_mirror, type: ip, size:1, used:1
0x55f490608b74: key=192.168.1.2 use=0 exp=0 server_id=1
– If I shutdown the s1 backend, failover happens, everything goes to s2, but no change in the table.
– when I put s1 backend back on, all further requests get back to s1
What I expected:
– once s1 is done, change server_id value in stick table would switch to 2
– when s1 is back online, stick to s2 unless it fails or is pushed to maintenance mode, in which case server_id in stick table would change again.
I’m on haproxy 1.7.3. What am I missing?
What you describe is what should happen with this configuration. Either you’ve got a mistake or you’re facing a bug, I can’t say for now. Please first upgrade to 1.7.5 to fix known bugs and retry. If it doesn’t work, you should bring this to the mailing list as it might be a bug.
I have found that using nopurge allows for a failback. Removing nopurge proves sticky.
As soon as my original server comes back up, if nopurge is set, the connect fails back.
I did not let my failed connection dead for for an extended time.
What I don’t understand if the table size is 1 and is no purged, then what is in the table when it fails to the second connection? I display the table and it never changes with nopurge set. But when not set, the table updates and the connection persists on the new connection.
I dont get the peers line, are you suppose to indicate the origin point of the clients? what if is dynamic?