In this presentation, Antonin Mellier and Nicolas Besin explain why SNCF, the French National Railway Company, chose HAProxy as a core element of its custom CDN to replace Akamai. By having HAProxy as the entrypoint and exitpoint for their CDN, they are able to offload SSL encryption, gain invaluable insights about errors and cache hit ratios, and accelerate troubleshooting. They use HAProxy for server persistence with cookies, weighted routing, and detection of abnormal user behavior.
Hello, today we are sharing how at Oui.sncf we build our own CDN with HAProxy. So, my name is Antonin Mellier and I am a technical architect at E.Voyageurs SNCF. And my name is Nicolas Besin and I’m also technical architect at E.Voyageurs SNCF.
Let me give you a few words about E.Voyageurs SNCF. E.Voyageurs SNCF is the digital company of the SNCF, the French National Railway Company. We operate a lot of applications and websites. As an example, we operate Oui.sncf. This website has about 16 million unique visitors per month and during our peak period we sell more than 40 tickets, train tickets, per second. We operate SNCF’s mobile app too. This application is downloaded more than 12 million times on the application store and it has about 40 million visitors per month.
At Oui.sncf, our journey with HAProxy started a long time ago in 2009. At this time, we had a basic three-tier applications with Apache web server, Weblogic application server and Oracle databases and we were using the Weblogic plugin to load balance traffic between Apache and our web application server. The problem was that we would have no visibility on what happened on our infrastructure. We were facing a lot of incidents, and debugging and understanding the issues was difficult.
Then, Willy came to our office in Lille and presented us HAProxy. After that we added the HAProxy statistics page and logs. We didn’t talk about performance or load balancing capabilities, we just talked about monitoring and observability. Just by looking at HAProxy logs or stats, we were able to understand the problem and the issue we had in production and start working on how to fix it.
After that, we have added HAProxy in front of all of our applications. Today, we have more than 100 applications in production with HAProxy in front of it. We have added HAProxy in front of our LDAP servers, SMTP server and some of our databases. More recently, we have chosen HAProxy as a control plane for Kubernetes clusters, but today we are not here to tell you how we use HAProxy on our on-premises infrastructure. We will tell you why in 2014 we have decided to choose HAProxy in our CDN solution and why we have built our own CDN solution in replacement of the Akamai previous solution.
So, let me briefly explain what is a CDN and why we made the decision to build our own. CDN stands for Content Delivery Network. It’s an interconnected network of cache servers, which speeds up display and the delivering of web content. Cache servers must be located as close as possible to our users in order to allow the most requested resources to become available very quickly. The most known CDN providers are Akamai, Cloudflare and CloudFront.
Using a CDN offers many benefits. It helps minimizing delay in loading web page content by reducing the physical distance between the user and the server. With this cache function, it can respond to end user requests in place of the origin. For a website like ours, only 10% of requests are reaching the origin server. It also provides protections against high traffic peaks in case of marketing events or DDoS attacks, by example. Finally, it saves money. CDN bandwidth costs are cheaper than traditional hosting.
CDN infrastructure is composed of three types of elements: DNS servers, Edge servers, and Origin servers. The first component of a CDN, which is also a key component, is DNS. When a user wants access to our website, he first queries the DNS server of his ISP to get the IP address associated with the requested domain name. In this example, an Internet user wants access to the Oui.sncf website. The DNS server of his ISP makes a request on the DNS server of Oui.sncf. They will answer that the DNS record is an alias of a DNS record which is managed by us. Finally, our DNS servers will return, according to certain rules such as availability or DNS geolocation, a record with short TTL corresponding to available cache servers.
Once the IP address is retrieved, the user is able to connect to one of our edge servers. The main purpose of an edge server is to cache the static resources. If the edge server has the resource in cache, it will deliver it to the client. Otherwise, it will contact the origin server to get it. The origin server is a datacenter hosting the client application.
You can ask yourself, why bother building our own CDN instead of using an off-the-shelf, existing solution? Before building our own CDN, we used a large CDN provider, Akamai. Costs were based on bandwidth and as our business was growing, costs increased to significant levels. It was costing us €1.2 million per year and now, by managing our own on our own, it costs us three times less. Moreover, many of the features we were using were paid options like real-time log analysis, SSL certificates, multiple origins, or A/B testing. As the majority of our users are located in France, we don’t need a worldwide deployed solution. Technically speaking, by managing DNS and CDN, there is nearly no third parties between ourselves and our client. We can control from end to end our platform and there is no black box in our application workflow.
So, how did we build it? First of all, I would like to point out that one of the key elements that helped the success of this project was that we were the customers of the solution we had to implement. So, it was easy for us to decide and to prioritize which elements we had to implement.
First of all, let me give you a few words about how we have chosen our hosting providers. For performance properties, we wanted dedicated hardware. So, we were looking for hosting providers that could offer us bare-metal servers with dedicated pools of public IPs. As we are a small CDN, we can afford not to share IPs between our different clients. We were looking for providers that could offer us DDoS protection and guaranteed bandwidth. Our hosting provider must have good network connectivity with B2C operators and with our datacenter ISP. And finally, we wanted to be able to install, configure and manage the operating system ourselves. So, keeping in mind all those prerequisites, we have chosen four providers spread across six datacenters. OVH and Online, which are European key and major providers, and BSO and Iguane Solutions, which are less known providers, but with very good network connectivity.
As we told you before, to operate a CDN we need a DNS infrastructure. Most of the features needed by a DNS of the CDN are not available in traditional DNS products, such as BIND or PowerDNS; and these features are not available from traditional DNS providers too. To operate a CDN DNS, we needed geolocation to return different IPs depending on the IP of the clients. We needed IP pool management to return a predefined number of IPs from an available pool of IPs; and we needed weighted shuffle to assign different weights on each one of our edge servers.
So after studying several products, we have chosen GeoDNS written in Golang by Ask Bjørn Hansen. Its product powers the NTP Pool system. In this product, the zone files, the DNS zones, are described as JSON files. So, to update a zone, you just have to upload the JSON file and the server automatically reloads the configuration. In our case, hot reloading is a very important feature because we have to update the DNS configuration frequently, for example, when we detect some problems or when we have to plan a maintenance operation or even when we have to add new clients.
In addition of our own DNS servers, we have added managed DNS providers. In fact, there are not a lot of companies that offer all the DNS features needed by our CDN. So after a quick comparison, we have chosen Dyn and they offered us advanced DDoS protection and a global presence all over the world to speed up our DNS queries. With this multiple-DNS architecture, we are able to distribute traffic between our two solutions. We have then, automatically, load balancing and failover in case of problems or latency on some of our servers.
On our CDN stack the cache is performed by Varnish. At Oui.sncf, we have used Varnish for years before building our own CDN. So this time it was obvious that we would use Varnish as our caching solution. Thanks to the size of our CDN most of the objects are directly stored in memory. This enables faster availability of resources. In this stack, there may be some components that look duplicated because HAProxy may handle the same stuff as NGINX and NGINX and Varnish could perform the same stuff too. But, we have decided to use the products for what they were designed rather than fit everything on a single product that was not initially made for this need.
The last thing I would like to say is the last component that the HTTP request passes through is HAProxy and by having HAProxy as the entry point and exit point of our CDN, we have a standardized view of our inputs and outputs. So, it’s easy for us to determine error and cache ratio. And when we have some problems, it’s easy for us to determine if an error came from our servers or if it came from the origins.
Now we will focus on the HAProxy part, how we use it every day and how it helps us detect problems of failures. First I will explain how we manage configuration files. We have developed an application that acts as a CMDB. This application stores the parameters of our infrastructure and each website or mobile app which uses our CDN. For each website, we store various information like caching rules, servers where we deploy it, IPs or ports that will be used, features we want to activate, if you want HTTP/2, IPv6 or SNI, by example. We can interact with this tool by API or web interface. Our configuration files are generated in Ruby by ERB templates and pushed in our git repository. After that, all our configuration is deployed by Ansible. When it’s possible to apply changes via CLI, we use a Unix socket instead of reloading the HAProxy process.
Now, I will explain how HAProxy interacts with origins servers. As we explained it before, Oui.sncf applications are hosted in two datacenters in active/active. Each datacenter is called by its public IP and is seen by HAProxy as a server. For each application, we can define which percentage of the traffic we want to route to each datacenter. For applications which are not stateless, we use a cookie for datacenter persistence. In case of maintenance, our Ops team can modify this percentage with a script that calls the Unix socket to update the weight of the servers and regenerates the haproxy configuration file without reloading its process.
For each application, we have set up an HTTP health check. So, if we lose a datacenter, all the traffic is automatically redirected to the other. We can also route requests by rules based on access paths by using ACLs. We have provided a media server that allows contributors to easily publish content such as images for email campaigns or large videos for the website. HAProxy uses ACLs to redirect requests to this server. With the same mechanism, we can also manage the parts of the website which are hosted in the public cloud like AWS.
Now, let’s talking about the monitoring part. We use Prometheus to collect the metrics of our different components. On each server we installed haproxy_exporter, varnish_exporter and node_exporter for hardware and OS metrics. As we still use HAProxy 1.9, we do not have yet the built-in Prometheus exporter. All these metrics are collected by our Prometheus server and we can define alerts for various indicators like HTTP error rate or origins states. With Alertmanager, which is part of Prometheus, we can group and route the alerts in email, Microsoft Teams or Opsgenie, for example. For the visualization of metrics we use Grafana and Promviz.
Promviz is a fork of a Netflix product using Prometheus data to display the traffic and the error rates between our servers. It is based on the haproxy_exporter and aggregates stats data to generate a dynamic view in WebGL. In reality, it’s more fun and prettier than is really useful for debugging or analysis.
So, this technical part is one of the most important and that we use daily. Because HAProxy is the best source of information about the behavior of our CDN, we use it for debugging and for monitoring our platform. With our data engineers team, we have built a solution that streams in real time HAProxy logs to our on-premises data centers. Locally, on each of our edge servers, HAProxy sends it logs to Rsyslog. Rsyslog then forwards its logs to an Apache Flume process.
Flume is in charge of duplicating, or duplicates the logs to our on-premises data centers. It handles encryption in the local buffer in case of problems or latency. Once the logs are now on our on-premises infrastructure, they are stored in Kafka topics. Then, they are consumed to be stored in Elasticsearch for real-time analysis and in Hadoop for long-term analysis. Before being stored in Elasticsearch, each line of logs is parsed and each field of the log is named and typed. We have developed some Spark jobs that read the raw logs in Hadoop, aggregate them, and then store them in Elasticsearch. With our aggregated data, we are able to perform long-term analysis on our application usage.
Thanks to this architecture we are able to produce many dashboards and we are able to perform requests on any fields coming from HAProxy logs. We use these dashboards on a daily basis and just by using those dashboards we are able to diagnose almost every incident. The dashboards on the screen show you the real-time bandwidth, HTTP return code per application, and traffic distribution between our customers.
The last thing we would like to show you is how we deal with stick tables. Currently, we use stick tables to determine if any IP had abnormal behavior. On our production, stick tables are only local to our servers. We have some scripts that periodically collect the contents of these tables, aggregate them, and store them in a Graphite database. With this solution we are able to know how much resources are consumed by our top clients. But if we want to know which IP consumed the resources, we have to look in our log files.
That’s the reason why last summer we have decided to implement a new solution based on HAProxy’s peer mechanism. This solution is currently under testing in our staging environment. With the new solution, every stick table is pushed to our monitoring servers and because the communication between peers is not encrypted, we have added a dedicated frontend and backend, which are in charge of encrypting and decrypting the traffic. With this new solution on our monitoring server, we have one stick table per edge server. We have certain, different solutions to collect the centralized stick table data and we have finally decided to use a Python script that will be in charge of reading the centralized stick table data and expose them as Prometheus metrics.
Here is what our whole edge server configuration looks like. As you can see, we have one peers section with two peers: the edge server and our monitoring server. You can notice that they both use local ports. We have one backend with only a stick table. This backend is named with the server ID. We have two couples of frontend and backend. The first one is in TCP mode and listens on the public IP, and it has SSL encryption enabled; and this frontend has to decrypt the stick table messages coming from our monitoring server. The second one has to send the messages to our monitoring server, but before sending the message it encrypts it. In fact, because our monitoring server didn’t have activities on its stick table, even if the sync mechanism works in both ways, in fact our edge server only pushes messages to our monitoring server and didn’t have any write requests.
Here is what our configuration looks on our monitoring servers. We have one peers section per edge server, and as configured on our edge servers, they listen on a local port. We have one stick table per edge server. It’s not visible in this example, but like what we have on our edge server we have dedicated frontends and backends that encrypts the messaging before sending it to our edge servers; and you can notice that there is one significant difference on the last frontend. We have only one frontend and it uses an ACL to route the messages to the right stick table by using an ACL based on source IP.
I didn’t say it before, but all this configuration is generated by Ansible during our installation or update process. So, all the information is generated or taken from our CMDB. So even if all these configurations seem to be complex, it is in fact just a simple Python Jinja template.
Finally, I tell you that we have a Python script that collects the data. In fact, we have, found on GitHub, an exporter made by sportradar, which automatically retrieves all the data from the stick tables and exports them as Prometheus. So, thanks to Prometheus, we have the same information we had previously in Graphite. But thanks to Prometheus labels, we are now able to easily filter on IP, client IP, or on our edge server, edge server name. After listening to the previous talk yesterday from Marko I think, when he talked about Data Plane API or native client, I think we will soon try this solution to perform more complex aggregation on this information that are centralized on our monitoring servers.
So, five years later what are the lessons for these projects? I think that we could say that we have built a reliable solution that fits perfectly our needs. Compared to public CDN solutions, there are some drawbacks, not in our point of view, but that are some tasks that are generally handled by CDN providers that we have to handle ourselves. For example, last year we had implemented IPv6 deployment ourselves. In the same way as HTTP/2, we have to configure it and validate it ourselves. We have to manage software updates, operating system updates, and we have to apply security patches too. All those tasks are invisible when dealing with traditional CDN providers, but on our side we have to anticipate it and it takes a significant amount of time.
On the other side, there are many advantages. We were a small team, just four people, and we built an entire infrastructure from scratch. It was and it is still a big technical challenge. By having to manage ourselves all the components of the infrastructure, we all have increased our technical skills. Our clients use daily all the monitoring and all the metrics that we provide to us, and it improves their diagnostic capabilities and we reduce incident duration. We all know exactly the behavior of our platforms.
And finally, the solution we have to implement costs less than a third of what we paid before with Akamai. When we launched our CDN solution, there were 18 applications on it. Today, there are about 40. During this period, our CDN bandwidth increased by more than 80% and all that was with the same infrastructure running cost.