When someone says a website is available, they mean that they can access that website. The application they’re trying to reach is up and working properly. High availability means that the website is up most of the time throughout the year. Companies can even put a percentage on this, striving for 100% availability, but typically getting somewhere a bit less, such as 99.9% or 99.99%. Five-nines-of-availability, an expression that means 99.999% uptime, which translate to being down for only five minutes out of the year, is a trait of companies that have reached the outer limit of what’s possible in terms of availability.
How do they do it?
Achieving high availability for your website can be a discipline unto itself, requiring you to examine each component that might fail: the network, the power grid, the likelihood of a natural disaster demolishing the datacenter where your servers are running. There are creative solutions to each of these failure points, but knocking down the low hanging fruit might be your best option. Let’s take a look at a few easy steps you can take.
#1. Run multiple copies of your website
If I were to put all of my money into a suitcase, hand that suitcase over at the baggage check before taking a flight, and then hope it arrives with me at my destination, you’d probably tell me that I was taking a risk. I would have created a single point of failure for myself by placing so much importance on one piece of baggage.
Similarly, it’s risky to host your website on only a single server. If, for whatever reason, that server malfunctions, users won’t be able to access your services. Instead, you should create copies and run them simultaneously so that if one server fails it does not become a catastrophic event. Load balancing is key here, since it allows you to relay traffic to all of your servers in a roughly equal distribution.
Having redundancy like this enables you to take servers out of the rotation for routine maintenance without disrupting active users. With enough redundancy, you might even decide to sleep instead of jumping into action when one or two servers fails during the night, at least until you can inspect the problem thoroughly the next day.
#2. Monitor for errors
Running multiple copies of your website should be followed by monitoring each server for errors, and then removing from the load balancing rotation any servers that become dysfunctional. Sometimes servers lose connectivity due to hardware failures or misconfiguration. Or, a bug may have been introduced into the application. Whatever the cause, it’s important to stop sending users to those servers until they can be repaired.
HAProxy provides built-in health checking of any servers that it’s load balancing, periodically sending a connection attempt or web request to verify that the service is up. The perk of this approach is that there’s no manual intervention required. Unhealthy servers are detected quickly and removed automatically.
#3. Implement a deployment strategy
Many organizations deploy new versions of their web applications several times per week if not more often. To reduce risk, they develop a strategy for deploying updates that avoids disrupting active users. There are several popular methods, including rolling deployments and blue-green deployments, and tools exist for making the process simple and repeatable.
Containers have revolutionized this aspect, and container orchestration platforms like Kubernetes facilitate common deployment patterns, allowing you to swap one container for another easily. The important thing is that when you need to upgrade the fleet of servers that run your website, that you do so in a controlled way.
#4. Test that it works
Have you ever experienced a power outage at your home, grabbed the flashlight out of its drawer, and only then discovered that the flashlight’s batteries had run out of juice? Also, how many of us check our fire alarms twice per year? We all know the benefits of testing, but it’s the follow-through that makes the difference!
Any strategy for high availability should incorporate regular testing. These exercises can often uncover gaps in your planning or areas where you can make better use of built-in features. For example, before powering off a server, you could use HAProxy’s drain mode to gracefully wind down active connections. Then, when returning the server to duty, you could use the slowstart feature to gradually ramp up traffic.
When testing critical components in your network, such as web servers, be sure to send notice to all those within your organization who might be affected in case things don’t go as planned.
Achieving high availability can involve building redundancy into the many layers that support your network infrastructure, but there are simple steps that accomplish a lot without incurring a lot of time and complexity. Start simple, but remember to test it!
Curious to see how we help companies build high availability and defend against malicious bots and other threats? Check out our User Spotlight series.