How Criteo handles 23M requests per second (RPS) with HAProxy Runtime API automation

Criteo handles 23 million requests per second (RPS) while maintaining peak performance and minimizing downtime. For most organizations, handling that level of traffic is just a theoretical stress test — a what-if scenario should their infrastructure ever be overwhelmed by an unexpected wave of requests. But for Criteo, 23 million RPS is just another Tuesday.

As the largest independent AdTech company, Criteo processes 9 billion bid requests every day across three continents and six data centers. Their infrastructure is built on bare-metal hardware, with over 30,000 servers. This scale requires an infrastructure that moves beyond manual configuration. To manage the massive flow of traffic, Criteo transitioned from vendor-locked hardware appliances to a fully automated Load Balancing as a Service (LBaaS) platform built on HAProxy.

SRE Basha Mougamadou explained at HAProxyConf how Criteo automates its load balancing stack to improve certificate management, backend scaling, and CPU efficiency.

Watch Criteo’s presentation from HAProxyConf and read the transcript.

Moving to a runtime-first philosophy

Handling 1 terabit per second across six data centers means any change to the system must be efficient. Criteo moved away from traditional configuration management and made the HAProxy Runtime API the primary interface for all operational changes. This allows the infrastructure to scale and update without manual intervention.

The scale of the operation makes this approach necessary. Because certificates and backends must be continuously updated and scaled, the system requires a high level of automation. If these events required manual work or process restarts, the overhead would quickly add up.

To avoid this, Criteo designed a system where the configuration remains static while the internal state of the load balancer changes dynamically. This shift removes the operational cost associated with high-frequency updates.

Automating TLS certificate management

Criteo maintains more than 5,000 TLS certificates with a 3-month validity period. These certificates require frequent rotation to maintain high security standards. Traditionally, updating a certificate required a manual configuration change. At Criteo's scale, the goal was to update security credentials automatically and in real time.

The team uses a three-part system to handle these updates. 

  1. The certificate provider microservice manages the lifecycle and communicates with external authorities.

  2. A control plane pulls renewed TLS certificates and sends updates to provisioners.

  3. These provisioners run locally on the load balancer nodes to update HAProxy.

The provisioners use the HAProxy Runtime API to modify TLS certificates in memory. This process involves a four-step transaction. The system allocates a new certificate, sets its contents, commits the transaction, and adds it to the list. This method allows Criteo to renew roughly 100 certificates daily without manual config updates.

This architecture ensures that the file system and the running process stay synchronized. If a node restarts, it loads the latest certificates from disk. This automation helps address the industry trend toward shorter certificate lifespans, with proposals to reduce validity to as little as 47 days.

Dynamic server provisioning for rapid autoscaling

Criteo operates over 100,000 containers across Kubernetes and Apache Mesos clusters. Application instance counts fluctuate significantly throughout the day. One application may grow from 115 instances to 600 in a few hours.

Modern infrastructure requires a way to add and remove these backends instantly. Criteo uses dynamic servers introduced with full support in HAProxy 2.5 to manage these changes. Dynamic servers allows the team to provision and delete servers on the fly.

Named servers are a practical benefit of this approach. Where older server templates used generic indexes like srv1 and srv2, dynamic provisioning gives each server a unique, descriptive name in the logs, making it significantly easier to trace errors to a specific container during an incident.

Removal follows a deliberate sequence: disable the server, drain active sessions, wait for the short "removable state" window, then delete. This prevents dropped requests during autoscaling and preserves consistent hashing for Criteo's Varnish cache layer, ensuring requests for a given path always reach the same cache node.

Aligning software threads with CPU hardware

At 23 million requests per second, small gains in CPU efficiency have a large impact on total capacity. Criteo uses AMD EPYC 7502P processors. These chips use a chiplet architecture where CPU cores are organized into Core Complex Dies.

Data travels much faster between cores on the same chiplet than it does across the I/O die. The latency for data sharing increases significantly when a thread moves from one die to another. To solve this, Criteo uses new CPU policy features to group threads logically.

The team uses the group-by-2-clusters policy to bind HAProxy threads to the physical layout of the AMD chip. This keeps related threads on the same core complex, which helps the CPU share data more efficiently and reduces the need for the processor to move information across the entire chip.

Criteo’s tests showed that this configuration reduced context switching by 20%. This change frees up CPU cycles to handle actual request traffic. The team also binds management processes to a specific core to avoid interrupting the main load balancing threads.

Read more: How HAProxy takes advantage of multi-core CPUs

Key takeaways for your infrastructure

Feature

Impact

HAProxy Runtime API

Automation for 5,000+ TLS certs; better security posture.

Dynamic servers

Infinite backend scaling without config changes; cleaner logs.

Automatic CPU binding

20% less context switching; optimized for modern multi-core CPUs.

Criteo's journey shows what HAProxy can achieve when you treat it as a programmable engine rather than a static process: a platform that adapts to demand in real time, at any scale. As Criteo looks toward the future, they are working with the HAProxy team to implement even more dynamic features, including dynamic backends and frontends, to reach 100% automation.

For teams looking to build similar automation at scale — without building the control plane from scratch — HAProxy One makes this kind of infrastructure automation production-ready out of the box.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.