Introduction
Redis is an open-source NoSQL database working on a key/value model.
One interesting feature in Redis is that it is able to write data to disk as well as a master can synchronize many slaves.
HAProxy can load-balance Redis servers with no issues at all.
There is even a built-in health check for redis in HAProxy.
Unfortunately, there was no easy way for HAProxy to detect the status of a redis server: master or slave node. Hence people usually hacks this part of the architecture.
As written in the title of this post, we’ll learn today how to make a simple Redis infrastructure thanks to the newest HAProxy advanced send/expect health checks.
This feature is available in HAProxy 1.5-dev20 and above.
The purpose is to make the redis infrastructure as simple as possible and ease failover for the web servers. HAProxy will have to detect which node is MASTER and route all the connections to it.
Redis high availability diagram with HAProxy
Below, is an ASCII art diagram of HAProxy load-balancing Redis servers:
+----+ +----+ +----+ +----+ | W1 | | W2 | | W3 | | W4 | Web application servers +----+ +----+ +----+ +----+ | | / | | / | | / +---------+ | HAProxy | +---------+ / +----+ +----+ | R1 | | R2 | Redis servers +----+ +----+
The scenario is simple:
* 4 web application servers need to store and retrieve data to/from a Redis database
* one (better using 2) HAProxy servers which load-balance redis connections
* 2 (at least) redis servers in an active/standby mode with replication
Configuration
Below, is the HAProxy configuration for the
defaults REDIS mode tcp timeout connect 4s timeout server 30s timeout client 30s frontend ft_redis bind 10.0.0.1:6379 name redis default_backend bk_redis backend bk_redis option tcp-check tcp-check send PING\r\n tcp-check expect string +PONG tcp-check send info replication\r\n tcp-check expect string role:master tcp-check send QUIT\r\n tcp-check expect string +OK server R1 10.0.0.11:6379 check inter 1s server R2 10.0.0.12:6379 check inter 1s
The HAProxy health check sequence above allows to consider the Redis master server as UP in the farm and redirect connections to it.
When the Redis master server fails, the remaining nodes elect a new one. HAProxy will detect it thanks to its health check sequence.
It does not require third-party tools and make failover transparent.
You are missing the “tcp-check connect” statement that make the connection. Took me a few minutes to figure that out.
funny how a haproxy 1.5.22 on ubuntu 12.04 worked -without- tcp-check connect, and the same haproxy on a centos 6.5 didn’t work without tcp-check connect.
thanks for the tip.
Hi,
This is not fun, it deserves an understanding!
Could you send a mail to the ML with more information about your environment and your configuration (anonimized) as well?
Baptiste
Hi,
I have been testing a very similar scenarios for quite a while and there is a problem that haproxy needs to handle properly. The main idea is that haproxy is looking to detect redis masters by querying each back-end server, which is expected for load-balancing but in the same time problematic for HA in the special case of redis. Redis has a master-slave sync option which combined with Sentinel works pretty good. However, haproxy does not get the current master details from sentinel so if a net partition occurs, a slave is being promoted to master while the old master is isolated; if the old master comes back it will come back as master and haproxy will see it as a valid back-end so it will send queries to it for a few good seconds until sentinel will reconfigure it to be slave.
I believe this can be “easily” solved by making haproxy get ip and port for the current master from sentinel, since sentinel is the authority in that redis infrastructure. Just playing with inter and rise to allow sentinel enough time to fix this 2 masters in the same time problem is not reliable and inserts a huge delay in the fail-over scenario.
What do you think about this? Is there something I am missing?
Hi adichiru,
You raise a valid point. I have been experiencing the same issue, were you able to configure haproxy check successfully with sentinel ?
I’m also interested in making it work with sentinel checks.
Have you managed to use it?
Thanks Adichiru,
A very valid point, split brain is a nightmare, you may loose a significant number of writes. Here we are facing the same problem. We could not find a solution on HAProxy so we are asking directly the sentinel information about the master, that works ok if you have a smart client like Jedis in Java.
Does anyone found a solution.
> Does anyone found a solution.
I did: https://selivan.github.io/2016/06/10/redis-no-splitbrain-on-network-partition.html
Shourtly speaking: put ‘slaveof 0.0.0.0 6379’ in every redis.conf, and forbid redis to rewrite it. After restart old master will become slave with unreachable master, refusing any write attempts. And after some time sentinels will make it slave for new master.
Adichiru brings up a valid concern. An option that we are experimenting with for such cases is to check for number of masters in the backend pool and do a tcp-request reject if it is not equal to one. I think this should work in most cases, as the safest choice in split brain situation is to not write anything to redis. If you do continue to write, it is hard to avoid data loss.
For the above configuration, the required tweak would be
frontend ft_redis
bind 10.0.0.1:6379 name redis
acl single_master nbsrv(bk_redis) eq 1
tcp-request connection reject if !single_master
use_backend bk_redis if single_master
## Check 3 sentinels to see if they think redisA is master
backend check_master_redisA
mode tcp
option tcp-check
tcp-check send PINGrn
tcp-check expect string +PONG
tcp-check send SENTINEL master myharedisrn
tcp-check expect string 192.168.1.13
tcp-check send QUITrn
tcp-check expect string +OK
server sentinelA 192.168.1.10:26379 check inter 2s
server sentinelB 192.168.1.11:26379 check inter 2s
server sentinelC 192.168.1.12:26379 check inter 2s
## Check 3 sentinels to see if they think redisB is master
backend check_master_redisB
mode tcp
option tcp-check
tcp-check send PINGrn
tcp-check expect string +PONG
tcp-check send SENTINEL master myharedisrn
tcp-check expect string 192.168.1.14
tcp-check send QUITrn
tcp-check expect string +OK
server sentinelA 192.168.1.10:26379 check inter 2s
server sentinelB 192.168.1.11:26379 check inter 2s
server sentinelC 192.168.1.12:26379 check inter 2s
## Check redisA to see if it thinks it is master
backend redisA
mode tcp
option tcp-check
tcp-check send PINGrn
tcp-check expect string +PONG
tcp-check send info replicationrn
tcp-check expect string role:master
tcp-check send QUITrn
tcp-check expect string +OK
server redisA 192.168.1.13:6379 check inter 2s
## Check redisB to see if it thinks it is master
backend redisB
mode tcp
option tcp-check
tcp-check send PINGrn
tcp-check expect string +PONG
tcp-check send info replicationrn
tcp-check expect string role:master
tcp-check send QUITrn
tcp-check expect string +OK
server redisB 192.168.1.14:6379 check inter 2s
## If at least 2 sentinels agree with the redis host that it is master, use it.
listen redis_master :6379
mode tcp
use_backend redisA if { srv_is_up(redisA/redisA) } { nbsrv(check_master_redisA) ge 2 }
use_backend redisB if { srv_is_up(redisB/redisB) } { nbsrv(check_master_redisB) ge 2 }
Hi James,
Thanks for the tip about sentinel.
I’ll write later a full article about redis/sentinel using your example.
Baptiste
Have you written this full article about redis/sentinel? I have a similar problem currently, with the added complexity that I’m deploying a redis/sentinel cluster using kubernetes, in which IPs are dynamic. I have a Sentinel service to discover Redis masters. However, is it possible to dynamically update the HAProxy configs with a new master IP?
e.g. /code src/redis-cli -h kubelb.host.name.com -p 26379 SENTINEL masters
Big thanks to James!!
I provide my own modified version which uses less backend section
# check 3 sentinel to see if they think redis1 is master
backend check_master_redis1
mode tcp
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send SENTINEL\ master\ mymaster\r\n
tcp-check expect string 10.10.10.100
tcp-check send QUIT\r\n
tcp-check expect string +OK
server sentinel1 10.10.10.100:26379 check inter 2s
server sentinel2 10.10.10.101:26379 check inter 2s
server sentinel3 10.10.10.102:26379 check inter 2s
# check 3 sentinel to see if they think redis2 is master
backend check_master_redis2
mode tcp
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send SENTINEL\ master\ mymaster\r\n
tcp-check expect string 10.10.10.101
tcp-check send QUIT\r\n
tcp-check expect string +OK
server sentinel1 10.10.10.100:26379 check inter 2s
server sentinel2 10.10.10.101:26379 check inter 2s
server sentinel3 10.10.10.102:26379 check inter 2s
# check 3 sentinel to see if they think redis3 is master
backend check_master_redis3
mode tcp
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send SENTINEL\ master\ mymaster\r\n
tcp-check expect string 10.10.10.102
tcp-check send QUIT\r\n
tcp-check expect string +OK
server sentinel1 10.10.10.100:26379 check inter 2s
server sentinel2 10.10.10.101:26379 check inter 2s
server sentinel3 10.10.10.102:26379 check inter 2s
# decide redis backend to use
frontend ft_redis
bind *:6379
mode tcp
acl network_allowed src 10.10.0.0/16
tcp-request connection reject if !network_allowed
timeout connect 4s
timeout server 15s
timeout client 15s
use_backend bk_redis
# Check all redis servers to see if they think they are master
backend bk_redis
mode tcp
option tcp-check
tcp-check connect
tcp-check send AUTH\ MYPASSWORD\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
use-server redis1 if { srv_is_up(redis1) } { nbsrv(check_master_redis1) ge 2 }
use-server redis2 if { srv_is_up(redis2) } { nbsrv(check_master_redis2) ge 2 }
use-server redis3 if { srv_is_up(redis3) } { nbsrv(check_master_redis3) ge 2 }
server redis1 10.10.10.100:6379 check inter 2s
server redis2 10.10.10.101:6379 check inter 2s
server redis3 10.10.10.102:6379 check inter 2s
It appears the ‘info replication’ blocks while the master is saving, can cause your HA to mark the master as offline if you have a large dataset (there’s about 21GB in this one).
redis-server.log (2.6.13) :
[2102] 02 Apr 11:23:19.087 * 10 changes in 300 seconds. Saving…
[2102] 02 Apr 11:23:26.915 * Background saving started by pid 30215
Redis client :
$ date; time echo -ne “INFO REPLICATIONrnQUITrn” | nc redis.master.host 6379
Thu Apr 2 11:23:19 UTC 2015
$116
# Replication
role:master
connected_slaves:2
+OK
real 0m7.382s
user 0m0.000s
sys 0m0.008s
Note that the command took 7 seconds, which is also the difference between the two redis log lines.
I’m not saying that this is a bad way of automatic master failover, but it’s just something to be aware of and your HA proxy needs to be tuned to be able to cope with how long your bgsave takes to fork.
This doesnt handle the situation where a first node is a master which is shut down gracefully and sentinel promotes the slave. when the original master is bought online, haproxy will think it is a master which will then cause clients to fail if the new master happens to be loading a large dataset into memory. The ideal solution would be to add a check to see if the master is loading data into memory and consider that as a down node until the node is able to handle reading and writing data
Hello All,
We have two redis web servers behind haproxy, but i need all traffic should go to Redis-web1 only and haproxy should divert traffic to Redis-web2 only when Redis-web1 is down ?
Is this possible ? Please suggest
Thanks
Sushil R
Note the AUTH command in case you need to authenticate yourself against redis:
option tcp-check
tcp-check connect
tcp-check send AUTH xxxxxxxxxxxxxxrn
tcp-check expect string +OK
tcp-check send PINGrn
tcp-check expect string +PONG
tcp-check send info replicationrn
tcp-check expect string role:master
tcp-check send QUITrn
tcp-check expect string +OK
Big thanks to James on this one, saved me a huge headache trying to provide HA for our developers.