Load balancing is an indispensable technique for improving a website’s performance. I’ll explain why. With Firefox’s Web Developer Tools open, I visited a popular retailer’s website to see how many HTTP requests my browser made when loading the site. In this case, I counted 119 requests needed to render the landing page.

This isn’t uncommon. Just as physical buildings require bricks, mortar, glass, and steel to give them structure, online spaces require markup, code, graphics, and text to make them whole. Websites blend together a mosaic of images, icons, stylesheets, fonts, scripts, JSON, and HTML to form the digital spaces in which we shop, bank, learn, and socialize. For each component, my browser makes a unique request, and they add up quick. To allow hundreds or even thousands of visitors to load all of these resources simultaneously, with everyone getting the same, responsive experience, companies need to use techniques that scale to accommodate the demand.

Load balancing is one such technique. When you load balance traffic, you’re distributing incoming requests to a collection of HTTP servers that all host their own copies of the files. While one server is returning an image, another is returning the JavaScript, and another is returning the HTML. By divvying up the work, the resources can be fetched in parallel in a way that supports many active users.

Horizontal scaling for better performance

Load balancing enables you to distribute work to a group of servers that host identical copies of your application’s files, and with cloud-based infrastructure making it easy to create virtual servers on-the-fly, the number of servers in the group can increase or decrease depending on current demand. This ability to change the number of servers at will is called horizontal scaling. Horizontal scaling, when used to match demand, helps improve the performance of your web applications, and at a lower cost than techniques like vertical scaling, which involves upgrading to more expensive machines (e.g. buying the server with the most powerful and expensive CPU processor). Horizontal scaling is cheaper in the long run because you can often use cheap, commodity servers.

Horizontal scaling in a nutshell: By adding more workers, more work can get done simultaneously, and visitors experience a snappier website overall. The key, though, is choosing the right load balancing algorithm for your use case. The algorithm decides which server in the group should get the next request.

Here is a sampling of algorithms that the HAProxy load balancer supports:

  • A round-robin algorithm sends requests to each server in a simple rotation. This is perfect when fetching resources, such as HTML or JSON files, that take a short and predictable amount of time to return. When response times don’t vary all that much, rotating through the servers works well to keep load (especially memory usage) evenly balanced across workers.
  • A least-connections algorithm sends requests to the server that’s the least busy. This works well when fetching resources that have variable response times. For example, when requesting a list of affordable airplane tickets, the server might need to aggregate data about scheduled flights, the number of seats remaining, and any eligible deals. Collecting that data might take a decent amount of CPU time or I/O bandwidth and so having the least busy server do it helps it go faster and avoid bottlenecks.
  • A hash-based algorithm, which associates each server with a particular kind of request, for example based on the URL, ensures that requests for the same resource will always go to the same server. It’s ideal for fetching cached files or data held in data shards. Incoming requests go straight to the cache server or shard that holds the requested information.

Promoting equal server utilization

At the heart of horizontal scaling and load balancing is the idea that operating multiple servers will help keep load at manageable levels on each server. When a server has some breathing room, when it isn’t operating near 100% of its capacity, then it typically performs better. By spreading load across many servers, you avoid any one server from becoming overloaded.

Apart from scaling out the number of servers, you can also think about using a load balancer as a gateway that shields the servers from traffic spikes. HAProxy can queue requests in front of servers, sending only a limited number at a time. Queuing improves performance by leveling out spikes in traffic so that servers can stay within an optimal range of work.

Load balancing has other benefits too. It makes the site highly available by ensuring that if one server malfunctions, others exist to pick up the slack. You can also offload work to the load balancer, such as compression, SSL termination, and response caching, which further improves your application’s performance, since it no longer needs to do that processing itself. Also, configuration settings like HTTP Keepalive and server-side connection pooling help improve performance even more.


Websites need to load plenty of files to build the digital spaces in which we all shop, learn, and hang out, but delivering those files quickly to many users at once can be a challenge. Load balancing and horizontal scaling help make it possible by distributing requests to a group of servers so that work can be done in parallel. HAProxy provides additional capabilities, including queuing, compression, SSL termination, and response caching, that help further improve performance.