HAProxy as Egress Controller
In this presentation, Julien Pivotto explains how Inuits uses HAProxy in an unconventional way: as a forward proxy to route outgoing traffic. This unique use case has uncovered a trove of useful features within HAProxy. They have leveraged HAProxy to transparently upgrade older applications to newer versions of TLS; It lets them connect to servers using DNS, but fallback to IP addresses if the DNS records can’t be resolved; It allows them to rewrite URLs, specify access controls, set application-specific timeouts, implement canary releases, automate rollouts after a specific date, and get detailed metrics.
Hello, hello. So my talk will not be about performance. It will not be about scale because I come from Belgium and basically in Belgium we have nothing at a very huge scale. My story will be about how we use all the features that HAProxy can provide us to achieve exactly what we need to achieve. Setting up the scene. This is about using HAProxy to send requests to the outside world.
We also have a lot of services and it means that we have some new ones that are like a few months old. But we also have some services that are like 10, 20 years old that are still running and that are still doing some kind of business. So we still need to maintain older stacks from very old and very new technologies. And another thing that’s important is that we utilize SOAP and REST APIs. So, it’s not about that we don’t need to do a lot of followings on the HTTP request. We only get like straightforward URLs that we know in advance. So it makes older quite easy because of that. Those very old technologies don’t change that much.
How does it look like? Basically, we have a bunch of applications in different VLANs and they will all talk to an HAProxy server, a cluster HAProxy servers.
Proxy URL Composition
We actually have a web page that will enable application owners to decide which URL they need to use. If we look at that URL more deeply, what do we find?
Then something that’s called the SLA. What we call the SLA is actually just timeouts because we do a lot of business. Some of them is, like, synchronous business. You expect a reply now; and the other thing is like posting a one gigabyte file to the other direction or putting a file of 5 or 10 or 20 megabytes, and for that you don’t have the same timeouts as previously.
And then you have the rest of the URL because, like, in one application you will have multiple endpoints behind that URL.
Actually, the reason why we have timeouts is so that every application that we have will now follow the same rules. We don’t need to define in each application what should be the timeout, what should be…we define that in the HAProxy and basically HAProxy will do the work for us to define what will be the timeout, the same timeout for such services.
I know if you can read one of those URLs, you can actually read all of them. If I show you now another URL, well maybe not now, but if you are used to it you can now know exactly what those URLs mean.
Let’s talk a bit about access control, because now that you have a central HAProxy you want to avoid mistakes, to get a bit of…to play with HAProxy ACLs, right? We have a very basic access control right now, which is just IP based, which you can not really call security, but it prevents a lot of mistakes at least. So, that’s the first access control that we have in place: just a bunch of clients with the name and an environment.
path_begdirective in HAProxy to say, “Okay. If the path is beginning with that URL like “myback/prod/example/prod/www/high”, then I will use the
use_backenddirective to say, “Okay. For that specific…if those conditions are matched…so in this case it will be that the application matches my client…that the IP matches my client and that the path is beginning with what my client should announce, then you will use that backend.” It means that for one frontend we use multiple backends depending on each one of the requests that we have. We don’t have a dozen different frontends. We route based on the URI that we get and only if the client is recognized, the client’s IP is recognized.
use_backend. Just know that the backend is just an external partner and not something that we own internally. Right.
SLAs, well, what we called SLAs.
Basically, we set the different timers. One thing that’s common is that we don’t want to queue requests at the HAProxy level, which means that if a partner is responding slowly and we fill the number of connections that we have set for that partner, then HAProxy will return a frontend HTTP request saying, “Hey, you know. I can’t do it now.”
The problem with what we have seen now is that basically the request that we make to the HAProxy is not the request that we want to send to the outside world, right? I mean if I contact a partner and that’s my URL, it’s full of all those things that we have put at the beginning, that’s not going to make it correctly. So basically, because HAProxy is not a forward proxy, it will not actually…it will, by default, just pass my request like it is now, like with all that big path that we want. So, we will change the request and will make it look like it’s actually a request meant for the backend that we want to address.
What do you need to change from that request to make it like a real request? The first thing is that you will want to change the host name, the host name header, right? Because now the header will be like proxy.inuits.eu. The Host header is not correct because in this case you would expect it to be like example.com.
This is how we actually alter the query. HAProxy is full of a lot of different functions that you can use to alter your requests. In this case, we will first set the header Host to www.example.com for that specific backend. It means that it will remove the existing Host header and it will just set it to example.com. Then, we set the SNI in this other part. Basically, we use the
str function, which takes a string as input and then you can just specify the SNI that you want to use. By doing those two things it is just like the request was sent for example.com from the beginning.
resolvers mydnsand in our case we actually use a lot of IPv4. In this case we also say, “Okay, we prefer IPv4 if you support both of them.”
Then there is that
reqrep line. What it will do is that the
reqrep will replace the request line in HTTP. Basically, what the request line will contain is the method that you are using like POST, GET, PUT, DELETE, OPTIONS, and then the URI and then the HTTP version. In our case, we are interested to keep the method, of course. We want to keep the GET, POST, etc. Then, we want to just remove the client and the environment if we can, and then the “example/prod/www”, all of which are the other attributes that we set in the URI from the application.
reqrepwill enable you to alter the request and change the URI on the fly;
set-headerwill just set a header if it’s not there yet. The
strfunction, very useful when you want to work with strings. So if you, the first time it’s write…you just put the string itself, it’s not working. So, you find the
strfunction, which is very useful. And then
snienables you to, when you do the SSL connection, put the SNI information.
A World About DNS
So DNS. Remember when I mentioned about private lines? That kind of thing? So, that’s my backend and basically I have
resolvers mydns inside the backend.
And then you ask, “Okay, but we cannot resolve that yet.”
And they say, “Yeah, when you can resolve it, then the service will be up.”
That’s not very helpful for us because we like to prepare stuff in advance. We also have some partners that, don’t ask me why, they decide that the production and the non-production will have the same host name, but different IP addresses. So this is very inconvenient for you. So what you used to do is like changing the hosts file on the end, but that’s really a mess. We don’t know why that happens, but basically it’s a use case that we used to support for like ten years, so now we need to say, “Okay. We will support it. Yeah.”
It means that HAProxy, because we have the
sni and the
http-request directive in the file, it will be completely transparent for the backend if we did use DNS or not. In those cases, that makes our life very, very easy and we don’t need to change the /etc/hosts file on all the different hosts because HAProxy is clever enough to just do all the work for us. And if we have a different backend with a different IP but same host name, there is no issue for HAProxy with that because they are completely independent. And that’s really, really great.
So, some advanced topics that we have. Now that we have all of that thing in place, what can we do more? What can HAProxy bring us more than what we had in the past?
We do it just by changing the path of the request and because we are able to do ACLs and we check for that first, then it will actually enable us to switch to the new service in 10% of the cases, which is really convenient and which we can do just in the top of the configuration file.
And we were like, “Ugh. Do we really want to work at 10 a.m. on Sunday just to make sure that the partner will be happy that we called the new URL?” So, instead of that, we actually use the
date function in HAProxy, which means that just at Sunday at 10 a.m. at that moment exactly, we knew that we would use the new service at the partner; and no one needed to, like, wake up on Sunday or work on Sunday. It was just all done magically via HAProxy.
This is actually really, really great because you can actually plan the changes in advance. You can plan the rollout, you can plan…a bunch of this stuff is that
date function that before was painful because you had no other solution that, like, “Okay. We will restart all the different applications on Sunday at 10 a.m. just to change the URL.” Now, we do all the work at the HAProxy level and it is clever enough just to do it just when we want it to do it.
You also have a partner who is like, “I want to make sure that you use TLS 1.2.” And how do you do that? Well, you have a nice
force-tlsv12 in the configuration. I’ve seen that in the latest release, you also have
force-tlsv13. Basically, it makes sure that HAProxy will only talk TLS 1.2 to the partners. So, you can tell them, “Okay. No, if you roll back to TLS 1.0 and disable TLS 1.2, we won’t talk to you any more, but you asked for it, you have it, right?”
Setup and Maintenance
Let’s take a look now at the setup and the maintenance of all that setup that we have with HAProxy.
What’s important is that for the application owners, for the people that need to connect to the proxy, it’s very easy for them. Because, all they need to tell us is who will be the client, who will be the partners, and which kind of SLA do they want us to configure. Basically, that’s pretty simple YAML files with a couple of lines for each entry and then we turn that into the gigantic HAProxy configuration using Ansible.
It means that we have an abstraction between that big HAProxy configuration that will actually enable us to remove a lot of mistakes that you can do or that you can also make a lot of strange things. So basically, we know in advance which kind of input we can get from the application maintainers and we can basically abstract that nicely. It also means that our developers, they don’t need to know everything about HAProxy. They don’t need to know about the configuration. They don’t need to know anything. For them, it is just like providing a URL and a bunch of other inputs and then that will just work out of the box.
So, when we look at one line of the file, you will see the client, your environments, also the partner, the environment, the application. What’s interesting is that when we change the URI, like when we want to send 10%…the example at 10 a.m. on a Sunday…actually, the URI that you will find in the logs will be the original URL. So, the URL that the application wanted to call, but the backend that you will see in the log is, obviously, the backend that was actually used. You can see which request was done on which backend, even when you have those rules like the Sunday at 10 a.m. rule or the 10% to another service rule. Basically, the log line doesn’t lie and it shows the correct input.
Then you see the status and the duration, which enables us to build a dashboard with the rate meter, which means we see the requests, we see the errors, and we see the duration. So, we can see when something is going wrong. What we are using is Prometheus. We are big users of Prometheus. Regarding monitoring, we are quite up to date. We use Grafana, the HAProxy exporter. We are still on 1.8, so we don’t have the native metrics yet, and we use mtail.
This is very tied to our usage and it’s very flexible. You can do a lot with that and at the end we have a full visibility on what’s going on, which partner is failing, when that partner is failing, that kind of thing. Then we can do very precise alerting to know which application is failing now to talk to its partner. Sometimes, it will be because of the application, sometimes because of the partner, and we have, actually, a really nice view on that thanks to this.
We can now be quickly alerted when something is not responding, when something is down. It means that when we need to change, when we need to analyze a failure at the partner, we know it directly. So, we can already see ourselves the business impact. Maybe for the partner, just an application will not be able to talk to a service at the partner. We see it directly now. We see it at the HAProxy level and we don’t need to look at different places. So, HAProxy will just centralize all those things.
It also means that for even a 20-years old application will also talk TLS 1.2 now, which means that we are not blocking partners to make their TLS stack evolve. We are just very happy to say, “Okay, yeah, we will support TLS 1.2. We will support SNI. Just do whatever you want. We will not block you.”
It also means that the timeouts are unified across all the stacks and the TCP retries, as well, are just unified. Also, the two-way SSL is now delegated to HAProxy. It means that when we need to deploy the client certificate, we do it just on the HAProxy cluster and not on four different applications in five different VLANs, that kind of thing. It’s not a pain any more because we delegate that to HAProxy itself. And all that’s nice, so we don’t need to work on Sunday morning. We can kind of take it easy, that kind of thing. It’s very nice and very helpful with HAProxy. So, we really like it.
It also means that when we want to change something, we can usually do it. When we want to change the URL, when you want to change something, we don’t need to restart the applications any more. So, you can imagine that when you have a 20-years old application that you need to restart it multiple times to change or tune the URL; and then usually you want to avoid that. So, basically, it’s very helpful for us to just…being able for the application to say, “Yeah, if we need to change some egress configuration, we can just do it at the HAProxy level and then, maybe, if that’s needed, we can change the configuration properly at the application side.” But just being able to do it at the HAProxy level only, prevents us from having to restart those very old applications. So, thank you!