As a leader in digital travel for more than 25 years, Booking.com has grown up with the internet. From a small Dutch startup founded in 1996 it has expanded into an online giant that seamlessly connects millions of voyagers every year with accommodation, flights, and rentals. As one of the world’s largest online travel marketplaces, Booking.com connects both established brands and small entrepreneurs to a global audience, all while providing 24/7 customer support.
Like every stalwart of the digital era, Booking.com’s network organization had long suffered from an antiquated setup left over from its early days online. When site reliability engineer Marcin Deranek took over in 2008, the load balancing infrastructure was a pair of Linux servers running IP Virtual Server operating in an active/standby configuration for failover. They were however encountering problems with ARP caching, thwarting the failover setup, and other customizability issues typically associated with load balancing at this level.
Early on, this encouraged Marcin to give an F5-based layer 7 load balancing solution a try. Their initial system consisted of dual F5 BIG-IP local traffic managers, which opened up options in terms of altering payloads or injecting or removing headers that they hadn’t been able to before tailor. However, the F5 load balancers, which were physical servers that weren’t malleable to configuration automation, required manual steps to upgrade and tailor settings, which began to take its toll even after developing a series of scripts for the job and implementing Puppet to ease the load. This system was also only vertically scalable, and as Booking.com’s customer base grew they ended up with a messy tree of differing generations of F5 hardware. Thus began the journey toward a software-based approach to load balancing.
These F5 load balancers were working fine, but we wanted to do more.
The initial goals of the Booking.com team were to eliminate this redundancy associated with cycling hardware, as well as to surpass the limits on the amount of sessions associated with their F5 set-up. This led them to implement a replacement for the F5 BIG-IP servers with an internal Load-Balancer-as-a-Service system utilizing software-based HAProxy Enterprise load balancers and Equal Cost Multi-Path routing. This strategy allowed routing of network packets along multiple paths of equal cost, enabled through an initial tier of fabric-layer switches and top-of-rack switches on Layer 3 that forwarded packets to a tier of HAProxy Enterprise load balancers that ultimately distributed traffic to application servers—a strategy which in turn meant per-flow load balancing, ensuring that a single flow of packets remained on the same path through its lifetime because of the hash algorithms rounded at the first tier, which then sent them to the HAProxy load balancer pairs underneath.
The team also wanted to ensure that they were routing network traffic to the correct destinations based on location and latency metrics. This meant that the new software-based HAProxy Enterprise load balancing system was implemented in combination with Anycast, a process where sender traffic is routed based on ideal network topology. Anycast theory meant assigning paths based on lowest cost, distance, hops, or measured latency, with a preference for locality.
In order to manage this new system in a way that could handle the billions of requests Booking.com was beginning to receive per day, Marcin also spearheaded the building of their own API, which they aptly titled ‘Balancer’.
This custom software would be used to manage their entire load balancing platform. Based around different clusters for different environments, the Balancer API lies at the center of the distribution network and stores its configuration as objects with attributes, as well as the relationship between them. This meant context-aware business logic could use the API to assign servers to back-end pools automatically and utilize smart traffic routing to relay packets based on set requirements of even traffic distribution and session zone stickiness.
In addition to this, the API automated provisioning of servers by attributes created for each existing pool. And each instance of HAProxy Enterprise could also be more precisely configured with a balancer agent, enabling a dialogue between the API, the top-of-rack switch, and the BIRD routing daemons configured by an Anycast healthchecker.
The system admins also took advantage of the unmatched observability of HAProxy by attaching daemons to the balancer which in turn processed the data it captured. The daemon periodically transmits these statistics to Graphite over the stats socket, with access and error logs being captured by Rsyslog and delivered to an Elasticsearch cluster, meaning a final wealth of information at the disposal of their customized UI.
HAProxy has plenty of different features so what you want to actually dump there it’s up to you. There’s lots of data.
The benefits of this software-based approach to load balancing meant unprecedented visibility for the team via the user interface of their Balancer API. They created a series of graphical displays which crunched the data provided by HAProxy Enterprise so that workers who were perhaps less familiar with load balancing could use the service, visualizing the metrics as well as how each object in the system was connected.
The flexibility of HAProxy Enterprise to be used under a Load-Balancer-as-a-Service network meant that the team could scale, visualize, and route in ways they could never before. The smart routing also providing protection against multiple failure scenarios, all of which can be addressed at line rate instead of pulling out the cables and screws of their former F5 or Linux IPVS systems.
Booking.com have also capitalized on the dialogue between HAProxy Technologies and its community as each iteration of HAProxy Enterprise becomes available, communicating their desire for TCP Fast Open on the backside and their move toward HTTP/3.
What HAProxy Enterprise Offers You
While Booking.com implemented a custom solution based on HAProxy Enterprise, you can get many of the same benefits with the HAProxy Fusion Control Plane. The Fusion Control Plane ships with a fully-featured API, support for smart routing based on server health, a rich User Interface, and more. Contact us to learn further.
Interested to learn more about HAProxy use cases? Explore our Success Stories page.