Failover is a function in networking — implemented virtually and across physical devices such as routers, switches, storage devices, and application delivery controllers (ADCs) — that activates a backup when a primary node fails. This keeps traffic flowing and enables uninterrupted access to websites, apps, APIs, and AI/LLM services. 

Failover and recovery are also key components of modern DevOps practices, which place immense importance on building resilient systems.

How does failover work?

Failover brings high availability to environments in which constant uptime is critical. It reroutes traffic from a faulty networking component, database, or load balancing node/cluster to a working standby. This is typical within an active/passive networking setup. Failover relies on this redundancy and assumes that a fault can happen at any time. 

Failovers can happen in multiple ways depending on how the systems are clustered, but they usually start with a failed health check indicating that a system isn’t functioning correctly. From there, traffic needs to be rerouted to other servers — either a hot-standby (passive) node or the other clusters in a pool. A load balancer will check the status of services behind it and route to ones which are functioning, while the load balancers themselves usually failover with a method such as VRRP (which moves a floating IP address to the active node), BGP (where a network switch hashes connections to assign them to a cluster of active servers), or other methods (such as DNS entry switching or using a cloud provider's native load balancer).

Automation is equally integral to failover, which shifts node prioritization without requiring an administrator's direct intervention (common during the switchover process). It offloads much of this work to a preconfigured set of routing policies, which can take effect almost instantly when crashes or slowdowns strike. This requires continual health checking across the network to determine which components are working properly and which aren't.

However, it's also possible to implement a "failure-proof" setup with more than two backups, depending on an organization's needs and industry. This may be necessary for compliance across environments that handle sensitive, mission-critical data — such as those in healthcare, finance, and energy. Reliability is also a common sticking point while vendors negotiate service-level agreements (SLAs) with their customers. 

Organizations can implement failover across a given cluster of components, but also across multi-region clusters. This helps ensure that no matter where a service or database lives, it will remain reachable to anyone regardless of their geographical location. 

VRRP (mentioned earlier) is another technology that helps facilitate seamless failover across numerous devices — keeping a watchful eye on additions or deletions to the device pool — without sacrificing flexible configurability or granular management.

You’ve mastered one topic, but why stop there?

Our blog delivers the expert insights, industry analysis, and helpful tips you need to build resilient, high-performance services.

By clicking "Get new posts first" above, you confirm your agreement for HAProxy to store and processes your personal data in accordance with its updated Privacy Policy, which we encourage you to review.

Thank you! Your submission was successful.

Does HAProxy support failover?

Yes! HAProxy One includes multiple features that keep backend systems running smoothly — including automated health checks, VRRP support, multi-region failover, and active/passive clustering. These intelligently keep traffic flowing for any service in any environment. 

To learn more about failover support in HAProxy Enterprise, check out our active/standby clustering documentation.