What is a Reverse-Proxy?
A Reverse-proxy is a server which get connected on upstream servers on behalf of users.
Basically, it usually maintain two TCP connections: one with the client and one with the upstream server.
The upstream server can be either an application server, a load-balancer or an other proxy/reverse-proxy.
For more details, please consult the page about the proxy mode of the ALOHA load balancer.
Why using a reverse-proxy?
It can be used for different purpose:
- Improve security, performance and scalability
- Prevent direct access from a user to a server
- Share a single IP for multiple services
Reverse-proxies are commonly deployed in DMZs to give access to servers located in a more secured area of the infrastructure.
That way, the Reverse-Proxy hides the real servers and can block malicious requests, choose a server based on the protocol or application information (IE: URL, HTTP header, SNI, etc…), etc…
Of course, a Reverse-proxy can act as a load-balancer 🙂
Drawback when using a reverse-proxy?
The main drawback when using a reverse-proxy is that it will hide the user IP: when acting on behalf of the user, it will use its own IP address to get connected on the server.
There is a workaround: using a transparent proxy, but this usage can hardly pass through firewalls or other reverse-proxies: the default gateway of the server must be the reverse-proxy.
Unfortunately, it is sometimes very useful to know the user IP when the connections comes in to the application server.
It can be mandatory for some applications and it can ease troubleshooting.
The Diagram below shows a common usage of a reverse-proxy: it is isolated into a DMZ and handles the users traffic. Then it gets connected to the LAN where an other reverse-proxy act as a load-balancer.
Here is the flow of the requests and responses:
- The client gets connected through the firewall to the reverse-proxy in the DMZ and send it its request.
- The Reverse-Proxy validates the request, analyzes it to choose the right farm then forward it to the load-balancer in the LAN, through the firewall.
- The Load-balancer choose a server in the farm and forward the request to it
- The server processes the request then answers to the load-balancer
- The load-balancer forward the response to the reverse-proxy
- The reverse-proxy forward the response to the client
Bascially, the source IP is modified twice on this kind of architecture: during the steps 2 and 3.
And of course, the more you chain load-balancer and reverse proxies, the more the source IP will be changed.
The Proxy protocol
The proxy protocol was designed by Willy Tarreau, HAProxy developper.
It is used between proxies (hence its name) or between a proxy and a server which could understand it.
The main purpose of the proxy protocol is to forward information from the client connection to the next host.
These information are:
- L4 and L3 protocol information
- L3 source and destination addresses
- L4 source and destination ports
That way, the proxy (or server) receiving these information can use them exactly as if the client was connected to it.
Basically, it’s a string the first proxy would send to the next one when connecting to it.
In example, the proxy protocol applied to an HTTP request:
PROXY TCP4 192.168.0.1 192.168.0.11 56324 80rn GET / HTTP/1.1rn Host: 192.168.0.11rn rn
Note: no need to change anything in the architecture, since the proxy protocol is just a string sent over the TCP connection used by the sender to get connected on the service hosted by the receiver.
Now, I guess you understand how we can take advantage of this protocol to pass through the firewall, preserving the client IP address between the reverse proxy (which is able to generate the proxy protocol string) and the load-balancer (which is able to analyze the proxy protocol string).
Stud and stunnel are two SSL reverse proxy software which can send proxy protocol information.
Between the Reverse-Proxy and the Load-Balancer
Since both devices must understand the proxy protocol, we’re going to consider both the LB and the RP are Aloha Load-Balancers.
Configuring the Reverse-proxy to send proxy protocol information
In the reverse proxy configuration, just add the keyword “send-proxy” on the server description line.
server srv1 192.168.10.1 check send-proxy
Configuring the Load-balancer to receive proxy protocol information
In the load-balancer configuration, just add the keyword “accept-proxy” on the bind description line.
bind 192.168.1.1:80 accept-proxy
The reverse proxy will open a connection to the address binded by the load-balancer (192.168.1.1:80). This does not change from a regular connection flow.
Once the TCP connection is established, then the reverse proxy would send a string with client connection information.
The load-balancer can now use the client information provided through the proxy protocol exactly as if the connection was opened directly by the client itself. For example, we can match the client IP address in ACLs for white/black listing, stick-tables, etc…
This would also make the “balance source” algorithm much more efficient 😉
Between the Load-Balancer and the server
Here we are!!!!
Since the LB knows the client IP, we can use it to get connected on the server. Yes, this is some kind of spoofing :).
In your HAProxy configuration, just use the source parameter in your backend:
backend bk_app [...] source 0.0.0.0 usesrc clientip server srv1 192.168.11.1 check
The Load-balancer uses the client IP information provided by the proxy protocol to get connected to the server. (the server sees the client ip as source ip)
Since the server will receive a connection from the client IP address, it will try to reach it through its default gateway.
In order to process the reverse-NAT, the traffic must pass through the load-balancer, that’s why the server’s default gateway must be the load-balancer.
This is the only architecture change.
The kind of architecture in the diagram will only work with Aloha Load-balancers or HAProxy.