From 1.5 into the Future: How HAProxy Rose from a Simple Load Balancer Replacement into our Swiss Army Knife
In this presentation, Christian Platzer explains why Willhaben chose HAProxy as their load balancer. Although they initially needed a way to add HTTPS to the site, they soon found that HAProxy gave them much more. It gave them the ability to pin SSL encryption work to specific CPU cores, use HTTP/2, extend functionality with Lua, and route traffic into Kubernetes. There were a few challenges along the way, which Christian says were much easier to solve with the help of the HAProxy Technologies support staff. Willhaben also uses the Enterprise edition’s Antibot module to protect against crawlers.
Hey everyone. I wanted to give back the applause to this enormous and gorgeous organization of this event. I think it’s in order to have a little applause for everything involved here, all the techs and stuff. Thanks a lot for being here. I’m really excited. Yeah, my name is Christian Platzer.
So it’s pronounced Willhaben, which is, if you would translate it in English, it’s something like “I wanna have this”, “wanna have”. Yeah, it’s a term youth. And we are Austria’s largest classified site, classified as in terms of Craigslist and not top secret. We are also one of Austria’s largest sites as such. So, we are top in Austria in terms of page requests and page impressions, which is kind of a contradiction. Austria is not a big country, but we are a large site. I’ll give you some numbers and you can decide if we are a large company or not.
It all started in 2015, which still amazes me; I have to look up this date every time I do a presentation because it’s just 4 ½ years ago. Back then we were around something like this. So, maybe as a background, Willhaben is a startup company. We started with very few people and it grew over time and now we have like 250 people, 250 employees, and you can imagine that there are a lot of challenges we had to tackle to put up with the growth.
Back then it was in the range of 600 to 800 megabits per second. This is a decent amount of traffic, but still it’s not that much, and 99% of it was HTTP. So, we were solely delivering HTTP requests or responses. That was it, except for login and payment. This was just 4 ½ years ago. Now we are running HSTS on our platform. So there is no HTTP anymore. Of course, we had a load balancer at the time of an undisclosed name. We had just four pools: our web pool, one for the API, the desktop, and the caching image pool. That was it and we were really limited in operating these pools. We just had the ability to drain them and disable them over the web frontend. There were no API requests for us because we are not hosting the infrastructure ourselves. We start from operating system level up and this also didn’t change over time. It cost a lot, this undisclosed load balancer solution.
I told him, “Yeah we could, but it will cost a lot.” You’ve seen the numbers from the first presentation. It really costs a lot and plus we’re not happy with it because we still would have to deal with this limited functionality that we’d need to put up with. So then this colleague of mine, let’s call him Mr. O., he was all like, “It’s Linux, baby! We can do this on software.”
I didn’t know HAProxy at the time, but he is one of these tech guys constantly pushing for new solutions and stuff, and he said that we can pull it off, and in my opinion, I always thought that if something is done in hardware it has to be way better than software because otherwise they would do it in software. So, I was really skeptical about replacing something like an F5 with a software solution that I didn’t even know at the time; but after some initial testing…yeah, I thought, maybe we can pull it off. We approached our manager. He gave his okay and that’s how HAProxy 1.5 came to pass.
The funny thing is with this kind of setup we kind of emulated a hitless reload because you could drain one backend, reconfigure it, put it up again, drain the second backend, configure it and put it up again. We usually didn’t touch our frontend process because we only needed to touch it or reload it when we really needed a new IP and that was not a common case back in the days. So, this worked pretty well. It was perfectly fine to deal with our 600 to 800 megabits per second. We always had core stickiness. That’s something that we also decided to keep even for the new implementations, and we were pretty happy with it.
Of course, when we first switched the DNS there were some problems; We didn’t account for the enormous amount of connections that would go over our backend if you keep it in Keep-Alive mode. We had to switch it to Server-Close there, but nothing that you couldn’t get under control. Yeah, enough from that.
Now we are running…so how did we solve this? We had this thing for some time. Now we are running everything in 10 gigabit interfaces, so everything is scaled up now. But at the time it was simply too expensive and one problem that we ran into was something like that. I think you can read it or decipher it. We had four of these interfaces connected into the link aggregation and one of them was overloaded. So, we hit the cap of one of the interfaces because at that time we only had two cache servers with two IPs and chances…so the bandwidth was not evenly distributed and we saturated one of the links, which had some really weird effects. We thought that HAProxy might have a bug at first, but then we looked at it and saw, “Okay, this is something completely different.” This was also the cause why we decided to not go for link aggregation. Let’s see, light hearted. Again, because we are already hitting the 10 Gig limit. No we are not hitting it, but we’re getting close to it so we’re still growing and that’s maybe something that we have to come up with in the future.
So, there are a lot of good reasons for us to go to multiprocessing; and one of the major implications was that it was not strictly required for Kubernetes, but we really wanted to have it for Kubernetes because in Kubernetes you need an Ingress Controller and it would be nice to just touch one configuration file there. Multithreading is something that we really wanted. It worked flawlessly without HTTPS. Remember, we had HTTP on the backend side, solely HTTP. There we were running from version 1.8 without any issues. This was really nice, although now it’s stable in 1.9r1; but it proved to be challenging when we enabled HTTP/2 or h2; Challenging in terms of undesired behavior. I’m going to get into detail of what happened there at a later slide.
Also, one thing that I really took care to have is a guaranteed syntax correctness. So, at any point in time I wanted to have a reloadable, restartable configuration at the host machines. You might think, how can it happen that it’s not that way? The solution for these problems is actually quite simple. In HAProxy 1.9 hitless reloads were introduced. Yay! We immediately switched to 1.9. It was a good decision and we finally were able to put everything in one process, have one configuration file where everything is there and go for Kubernetes.
In Kubernetes we have a self-written Ingress Controller. So, we are not on 2.0 yet, but it’s definitely something we will have a look into. Until now we’re using self-written Ingress Controller. It’s also written in Go, it works pretty nice, but does not support everything you could do in a Kubernetes Ingress itself. There are certain things missing there.
We’re using optional backends. What does this mean? Usually if you have a configuration you have some IP address. We do it like this. That’s an ACL. We bind all the IP addresses in a list. It’s not that many. You have an IP address; Hey, if the destination addresses is this, then use this backend k8sapp. If you just put it there, the below configuration is generated.
We include two configuration files: One is the main configuration; One is the configuration provided by the Ingress Controller. If you do it like this and you simply use
use_backend Kubernetes app and someone decides to un-expose the service, then the backend is deleted and you cannot reload HAProxy again or restart again because it’s a syntax error, because you have a backend used that’s not there. If you do it like this it won’t work, but everything else will work and that’s the thing that we go for. I don’t know if there’s a better solution. Probably there is, but that’s what we came up with. I don’t know, you know how it is. You have a problem, you find a solution, it works, you keep it that way.
It had a huge impact in TCP sessions that we had on our servers. Of course, we also wanted to be cool and have h2 enabled as fast as possible, and we did enable it. It’s essentially just, you put in the right things in HAProxy and HAProxy does the magic for you. It’s done. It’s done and we got all that we wanted, so now we were cool. We had less connections, everything was really fine.
We also had some segfaulting. It looks bad, but if you think about it the busy looping is far worse because usually, first of all, our prime time is not during working hours. So people get home from work and they scroll through our ads and that’s mainly in the evening, somewhere in the evening, and then we have our prime times and all these problems were prone to happen during our prime times, during high-load scenarios.
Sometimes one of the connections just got stuck and it essentially just took the resources of one of the processes. This by itself is something that will not pop up in some monitoring because it’s just a higher load. I mean our load, normally, is in the range of six cores during rush hours and then it was seven cores. We do have monitoring on the load there and if more than one gets stuck, we get an alert and we can restart it there, but still it can be pretty annoying.
On the other hand, if the process segfaults, it stops and it’s immediately restarted. Sometimes it was so fast that Keepalived was not failing one single health check in the time. HAProxy starts really fast, so it produced a downtime of one to three seconds or something, which is still bad, but not that bad. A segfault, in my opinion, is something that we could live with easier.
Then, we also had some very strange behavior that I wanted to diverge into and give an overview. We saw a lot of these lines and with “a lot” I mean like 40 to 60,000 a day, where, apparently, HAProxy…So, cd means the client aborted the connection and it was in the data phase. So, the client sent the request, everything was okay, header or everything, HAProxy directed it to the backend, started receiving the answer, and then cut the connection. We also did a TCP dump and it was just like that. HAProxy sent the reset packet.
We also saw this in our applications. We are running Java and there it was, a lot of these lines: Error while sending response to client. Connection reset by peer. So, the peer reset the connection. This has to be bad, right? After hours of digging and reproducing this stuff, we found out that it’s a Firefox feature and it’s called Race Cache with Network. Who of you knows this Firefox feature? Raise your hands. One. Please keep it up, I can see it. Yeah that’s one geek. Two, three. Yeah, about the same happened at our company.
Race Cache with Network in Firefox works like this: If there’s a cached resource, an image or something, in our case a configuration, and Firefox requests this resource longer than 500 milliseconds after the initial page load, then Firefox decides, “Okay, this web page is apparently very, very slow and most probably it’s caused by a slow computer. A lot of laptops are still running spinning discs and if it takes longer than 500 milliseconds, you know what I do? I just request the file, which should be cached from the local cache, and simultaneously I produce a network request to the original source and whichever arrives first, I will take.”
That was the first thing that boggled us. Why are these resources even queried in the first place because they should be cached on the client, right? We thought we messed up something there, but this was the case and it took us really a long time to get to the point of it. That was what I wanted to mention. Before we enabled HTTP/2, there was no way for the client to abort this request except terminating the TCP request, which it wouldn’t do. It wouldn’t terminate the TCP request. I don’t know why. In HTTP/2, it could because it just used the already established stream and immediately sent an abort. The TCP request was still established and that’s it. This was a feature that impacted us because we enabled HTTP/2, but was not really related to it. Quite interesting. If you see something like this, mind my words.
Just two weeks ago we had a student at the university trying some things and he was producing searches as fast as the network could handle from his mobile device. It was enough that we really saw him. I could tell you a lot of stories in these requests. Also, it impacts our business because it sometimes produces the wrong access numbers if you want to show how many requests were to your ad, for instance, then this could be heavily influenced by a crawler.
Right from the start we implemented something from HAProxy which is the tarpitting functionality. This is quite easy, I really like it. What it essentially does is you have a blacklist of IP addresses and if this IP address requests your service, then after a grace period of ten seconds it gets a 503 or a 502 and it looks like the services is down. It works pretty well, but you have to really take care because it’s highly disruptive. You are actually blocking all of the requests to your site from a different IP address and this could be a bad idea if a company is running a NATed network or a web proxy, for instance. Then, you always see the same IP address and if it misbehaves, if you block the whole range or the whole IP address, then it could lead to bad side effects. Of course, it’s also dangerous if you have a dynamic IP pool. Most mobile devices are assigned dynamic IP pools and if you blacklist them it could easily happen that the IPs switch and then a new customer suddenly cannot access your site anymore and you don’t know why.
It was really easy to implement the different strategies for how to produce this challenge. Either it’s automated, you can even include Google CATPCHAs, a wide variety of things that you can do, and it looks like this. Sorry. I had to blind the secrets, but essentially all the logic to decide if such a challenge is produced in the first place, is contained in a Lua script. I decided to go Lua there. Most probably you could do most of the things in the configuration itself, but it’s pretty nice to have it contained in a Lua script, then you can code different triggers and stuff. We came up with this.
For this we usually use
leastconn because it’s really perfect for Java applications where you have, occasionally, a garbage collection, and if you use
leastconn as a balancing algorithm, you essentially don’t overload. You redistribute, during small hiccups you redistribute all the requests and that’s simply not possible in Kubernetes anymore. That’s why we had to switch Kubernetes to Flannel, which is the overlay network to LVS. So, it supports LVS and there
leastconn is supported again.
Secondly, something that I wanted to show you, we just started dynamic environments. So, the thought was for every pull request we wanted to have something like a dynamic environment that can be accessed. So, a developer creates a pull request, a complete dynamic environment spawns up, they test it and if it’s okay, the test is okay, it’s merged, then it’s torn down again. We managed to do this with Kubernetes, but what we needed is a URL.
The URL looks something like this. If you have a URL and a hostname that looks like this and you want to access a backend that looks like this, I knew that this is one of the things that, when you know HAProxy, you know that it can do it. It might be a hassle to implement it, but you know it can be done and that’s how I did it. I know maybe it’s not beautiful, but it works. It’s dyna_ and it takes the Host header. You read it from left to right. It takes the Host header, the whole thing, to lowercase, substitute dashes with points, and take the second field, which is the hash, and append it to Kubernetes. Shame on me, but it works perfectly! I’m really happy with this solution and now we implemented this dynamic application. The only thing that you have to do is create the wildcard CNAME for the subdomain. That’s everything you have to do, essentially, to get everything to the same IP address and then make this decision. That was it.
We’re dishing out 700, 750 terabytes of traffic in 30 days and, at the moment, we are running 55 backends. More than half of them are generated by Kubernetes. It’s going in a microservice direction and will be more. Everything is handled with just 14 threads. We had no need to expand this. We can deal with this traffic easily with seven cores at peak time.