Quoting A0 Labs' CEO Olivier Warin:
A0 Labs is a French hosting company specialized in critical applications which need high performance and security. We have been using HAProxy for several years now and deploy it for our customers to provide high availability and performance to their web sites.
I would personally like to add that A0 Labs helped a lot on the FreeBSD port a few years ago, regularly helps by benchmarking and testing new releases, and contributes to quite a number of other open source projects, not just HAProxy!
Airbnb built SmartStack on top of HAProxy to have exactly the load balancer they needed. The article explains in details the limitations of all other solutions they have considered (including HAProxy itself).
Taobao's CDN is the world's largest picture CDN, it delivers contents for all online shops hosted by Taobao and Alibaba, they represent around 80% of China's online business. They use a lot of open source, use simple and scalable architecture including LVS + HAProxy for the load balancing layer, Squid for the cache and their fork of nginx (Tengine) for the servers. A fairly complete and interesting overview is available here.
“Biblio.com uses HAProxy to load balance and provide fault tolerance for 10 million searches per day across a cluster of Solr search servers that index 95 million bibliographic records of used and rare books. HAProxy has been very fast and rock solid for us. Thanks for providing it.”
- CEO Brendan Sherar
BitPusher designs, implements and manages infrastructures that are highly reliable, scalable and performant. Quoting CEO Daniel Lieberman
We use HAProxy extensively, both as a standard edge load balancer and (especially in AWS) as a proxy for internal services, avoiding the need for IP-based failover.
An article on High Scalability gives some details about the architecture behind DISQUS and the traffic it has to deal with. In short, 17000 requests per second, 250 million visitors. They're using HAProxy both in front of the web servers and in front of the database. A lot more info are found in this detailed presentation.
Egnyte is a Cloud File Server. Quoting Sachin Shetty:
HAProxy is a fantastic feature-rich load balancer and we at Egnyte have been using it for a while. Apart from using HAProxy for standard application load balancing, we are using HAProxy to overcome some limitations of Apache like use queue timeout to prevent backlogging in Apache when application servers are loaded. We also use HAProxy for load direction to route requests i.e. send requests to specific server under specific conditions and failover accordingly.
Thanks guys, your nice feedback is much appreciated!
Héctor Paz, the sites' sysadmin, reported on the HAProxy mailing list:
We use HAProxy to handle web traffic for peruvian news sites: elcomercio.pe, peru21.pe, etc. Around 2k session rate in peak hours. HAProxy is the most reliable part of our architecture.
Farmville is one of the most popular online games, edited by Zynga. Mark William indicated here that mid-2010 the site had over 70 million active users a month. While Zynga doesn't explicitly advertise use of HAProxy, they don't hide it either as they report using RightScale at Amazon EC2 to scale seamlessly, and even the error page has it in its URL.
Free is a major player among the french ISPs. Free has always promoted the use of free software, and has been using HAProxy for many years. The Webmail and the file exchange service have been the most heavily loaded deployment ever reported in terms of network bandwidth, with more than 5 Gbps of traffic at any moment. They're used to provide extremely valuable feedback which has contributed to make the 10Gbps performance something real and to get TCP splicing a reliable solution.
There's a lengthy article on the GitHub Blog entitled "How We Made GitHub Fast". It explains in depth how the GitHub architecture works, and there's a lot to learn there for anyone who's planning on starting a scalable site. Interestingly, a second article here gives a few more details as to why they're not only using HAProxy but ldirectord too (e.g. smaller memory usage in VMs).
Globo.Tech is a Hosting and Managed Service Provider, in Canada. Anthony Levesque describes how they are using HAProxy in their internal infrastructure, as well as for their client-facing needs:
We know that using HAProxy allows us to build robust and scalable setups not only for our clients, but for our internal services as well. We operate multiple internal or private cloud Infrastructures where the management layer's high availability and scalability is done with HAProxy. HAProxy is a reliable constant in our clustering design.
The Imgur guy describes his architecture choices on Reddit here and why HAProxy makes a good choice for him here. The full thread is quite informative about what issues such fast-growing sites are facing.
Jon Watte, IMVU's CTO, describes on slideshare how IMVU's architecture works and how it scales. HAProxy is just one small piece in the puzzle there despite being on the front. The site's home page indicates the number of concurrent users in real time (more than 120k when last checked). It's nice to see some large sites sensibilized to latency and report their usage numbers.
Mike Krieger explained on Slideshare how they scaled Instagram to 30 million users in less than 2 years. Now they're the first proposed choice when you strike letter "i" on Google! Of course, when extremely fast scaling is needed, HAProxy is in the mix :-)
Linux Kernel hosting infrastructure's admin and architect, Konstantin Ryabitsev, shares his experience running one of the most challenging sites dedicated to development, and of course it runs behind HAProxy :-)
London Trust Media's President, Jonathan Roudier, says
Half of the Internet is built on HAProxy. We are pleased to support the Internet!
While I personally think that "half of the internet" probably is a bit overrated, I'd like to state that London Trust Media is one of such now rare companies so obsessed by their customer's experience that they're constantly striving to squeeze the last possible nanosecond of latency in their infrastructure and with which I'm delighted to discuss bits and bytes, CPU affinity, cache miss latency and such important considerations that tend to be dismissed too often by many users these days. Sharing experience with people able to provide test results is pleasant and helps designing better solutions. Thanks for your support guys and don't change the way you work!
MaxCDN indicates here that they're using HAProxy in their CDN solution.
Kevin Phair of NYI reports that HAProxy handles the load balancing aspects of their Fault Tolerant Web service:
HAProxy easily fits our performance needs, and we find it far easier to manage and trouble-shoot than any of the expensive big-name load balancer options that have had experience with.
The Olark guys explained here how they set up their site with high availability, and some of their decisions to ensure uninterrupted service in case something goes wrong. They give a bit more details about the monitoring and some architecture fixes here. Please note that Olark is among the cool companies who funded the development of a number of features.
pfSense is an open source firewall based on FreeBSD and has an HAProxy optional module along with a web interface for configuring HAProxy. More information on the package is available here.
Ravelry is a social network dedicated to knitting that was founded by Casey and Jessica Forbes in 2007. It was quickly welcomed with a great success and Casey had to perform important changes several times in the architecture to follow the growth. In 2008, one year after the project was born, Casey told me:
HAProxy is fantastic. We use it at http://www.ravelry.com to handle 5 million or so requests per day.
And now we're in 2011... Their project is quite original and I wish them a long success story!
It's probably the only site who is so open about its infrastructure that even their HAProxy configuration is available to everyone!
As described in their architecture overview, Red Hat uses HAProxy as the load balancing solution in its cloud architecture OpenShift. While I know for sure they're not the first cloud provider to use it, I can say that they're the first one to openly admit it and that's nice from them (their architecture overview is well detailed and worth a read BTW).
Ben Timby of SmartFile says:
Hundreds of gigabytes of data flows from SmartFile through HAProxy each day. SmartFile uses HAProxy both for HTTP and FTP protocol load balancing. The PROXY protocol makes it possible to provide highly available and lightning fast FTP service. Without the many features of HAProxy and the support of Willy Tarreau dragging the old FTP protocol into the 21st century would have been near impossible. Many thanks for such a stable and flexible product.
Please note that SmartFile was kind enough to fund development of the server-side PROXY protocol implementation.
Anthony Gerrard of SOS Children's Villages UK says:
SOS Children's Villages UK have been using HAProxy with great success for nearly 3 years now. We were previously using a popular HTTP server's load balancing capabilities for distributing traffic to our CMS instances but were experiencing issues that all went away after the move the HAProxy. Since then we've also made use of its capabilities to run content experiments on our donation forms by directing traffic to different Tomcat back-ends based on HTTP path parameters.
The same team is managing both sites. They're well known and have high expectations on reliability and quality of service. They've funded the development of the anti-abuse features in HAProxy.
We're big fans of HAProxy, which the guys at Reddit turned us on to. It has been working flawlessly for us in load balancing Stack Overflow between two - and now three - servers.
There's also a nice presentation of the updated architecture (2014) on High Scalability.
Transloadit is a file upload service for web applications. One of its co-founders, Kevin van Zonneveld, explains that he uses it for the content switching, and also gives some hints about setting up logging under Ubuntu. I'll probably have to put that into the doc because it looks like it was not obvious.
As of April 2012, TubeMogul is the biggest Real Time Bidding video ads platform. Nicolas Brousse, Lead Operations Engineer, says : “We use HAProxy in four different EC2 regions and five Availability Zones. It allows us to handle over 10 billion HTTP bid requests a day and deliver over six billion videos ad streams last year”.
As of December 2011, Tuenti is the most trafficked website in Spain with more than 12M users and 40 billion page views a month. In the following presentation, Senio systems Engineer Ricardo Bartolomé explains the previous load balancing infrastructure, the new load balancing strategy, as well as the reasons why they have chosen HAProxy as the Layer7 load balancing solution : http://www.slideshare.net/ricbartm/load-balancing-at-tuenti.
This article gives some details on the Tumblr architecture. As of Feb 2012, it's at 500M page views a day, 40k requests/s with plans to go to 400k, and observes a 30% monthly growth. It involves more than 1000 servers, and employs 25 HAProxy, 15 Varnish and 8 nginx to make this run smoothly, the same winning trio that is found on many large sites!
John Adams explains here how they scaled Twitter to support a traffic growth of 1358% in 2009. It looks like they adopted the principle of "one component per function" which generally scales the best. There is very little information about the load balancing part in this slide show, but it also happens that scaling the rest is much more important.
There was a presentation from Virgin America at LinuxCon 2010 where they explained how they migrated to full open source. Among the numerous products involved, HAProxy is used for the load balancing. The complete presentation is available in PDF format here.
The W3C obviously doesn't have to be presented to you if you're working in web environments. Yes, when you visit the W3C, you're passing via HAProxy as is explained here. From some past discussions, I remember it also helps protecting the whole site against unintentional misuse caused by excess of document validation.
This probably is the fastest site I've ever seen and certainly one of the most highly stressed I know. They deliver the nice counter you can see on the HAProxy page to millions of web sites around the world. To get an idea of the load, consider that each time one of these sites' page is viewed, they receive a request. Their response time and availability are obviously critical to those sites, and they excel in this area with sub-millisecond response times. This site perfectly fits HAProxy's strengths, and some of the high performance optimizations directly come from their feedback.
Contact the authoritative experts on HAProxy who will assist you on finding the solution that best fits your needs for deployment, scale and security.