An HAProxy load balancer allows you to mask IP addresses in order to protect the privacy of your users. Read on to learn more.
How do load balancers fit into the larger debate over data privacy and security? It helps to step back and consider how HAProxy adds a layer of defense at the edge of your network.
Because your HAProxy load balancer is a reverse proxy that routes traffic to your services, it can act as a powerful, first-layer of defense. First, access control lists (ACLs) give you the power to define rules that can deny, tarpit and reroute hackers. Next, there’s stick tables. A stick table is a type of in-memory storage that allows you to correlate user behavior across requests. Stick tables give you the ability to detect and protect against bot threats. Maps are another type of storage. They consist of key-value pairs and allow dynamic lookups of whitelists, blacklists and geolocation data. You can also use maps to set distinct rate limit thresholds for different URL paths. These building blocks combine to give you a myriad of ways to respond to threats.
Data privacy is another facet of security that, in some instances, can be managed at the edge of your network. Since HAProxy has the capabilities to inspect, filter and relay the traffic flowing through it, it has access to information about your users. In this blog post, we’ll focus on a key piece of information collected: the client’s IP address. Care must be taken to either safeguard it or, as you’ll read here, anonymize it so that it can’t be used to identify any particular person.
Data Protection Regulation Matters
It’s hard to have missed the rollout of the General Data Protection Regulation (GDPR), the law that protects EU citizens’ right to data privacy. People everywhere were inundated with emails asking them to consent to company data protection policies. Many organizations scrambled to become compliant, fearing the potential €20 million fine.
Even if you aren’t regularly providing service to people in the European Union, it’s probably safer to act as though you were. It’s also easy to predict similar regulation being mandated in other countries. It’s better to get ahead of the curve and put mechanisms in place that will safeguard your data.
If you’re using HAProxy Enterprise, then you have access to deeper insights about your customers. The Digital Element and MaxMind modules enable you to associate geolocation data with an IP address. This can be extremely helpful in rate limiting users based on region or for collecting customer demographics, but it strengthens the need to protect that data.
You also have the ability to use 51Degrees device detection to learn which types of devices your customers are using. This lets you create routing rules that are based on how your customers are accessing your services. Or, you can enrich the data forwarded to your backend servers by adding this device-detection information. With this increased intelligence comes the heightened responsibility to ensure that you’re protecting the personal data of your users.
Safeguarding personal data is clearly important. So what constitutes personal data?
What Constitutes Personal Data?
The GDPR defines personal data as meaning:
any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
You’d probably agree that most of this definition is what you’d expect. A person’s name or their unique ID number clearly points to a specific person. Information that distinguishes them from others such as by physical, genetic, or social identity could be used to recognize a person; but maybe the mention of an online identifier surprises you. Generally, this is interpreted to mean an IP address.
The average bad guy will often have a hard time knowing who someone is by their IP address alone. For me, running a whois command with my IP returns information about my ISP, not me specifically. That isn’t always the case though, and a government agency may ask an ISP to connect information related to an IP address to data that you have. Whether it’s a big risk or a small one, the consensus is that the GDPR requires you to treat IP addresses as personal data.
Okay, so you should handle IP addresses as personal data. That means that you shouldn’t store them for longer than necessary. You might even consider whether you need to store full IP addresses at all. In the next section, you’ll see how to mask IP addresses by using the
Masking IPv4 Addresses
If you’ve enabled logging in HAProxy (Hint, you should. It’s awesome), then you’re likely already capturing the client’s IP. Assuming that you’re using the default log format, it will be listed near the beginning of each line in the log:
Here, the IP address, 192.168.50.5, is captured and displayed in the logs. You can enable IP masking to depersonalize that address, saving only a portion of the IP and changing the rest to zeros. Here’s an example:
http-request set-src directive changes the source IP address for the rest of the transaction. It uses the
src fetch method to get the client’s 32-bit IPv4 address and then the ipmask converter to mask a certain number of bits. The converter takes a subnet mask as a parameter, which specifies how many bits of the IP you want to show. Passing in 24, or 255.255.255.0 which is equivalent, sets the IP 192.168.50.5 to 192.168.50.0.
Notice how the first three octets of the address are left as-is, but the last octet is changed to a zero. Or, to give you another example, you could mask the last three octets by passing in 8, like so:
This would store only the first octet of the IP in the logs:
http-request set-src directive affects the rest of the transaction. That means that the IP will be masked in the logs, but it also means that where you place this line in your configuration matters. Think about this: Will the following configuration deny the request, but also mask the IP in the logs?
The answer is no. It will mask the IP, but then the deny line doesn’t work, since it comes second and ends up working with the masked IP address, which does not match the address 192.168.50.5. Suppose you switched the order:
In this case, the request will be denied if the client’s IP matches the given address, but that deny rule stops the processing of any additional
http-request lines. So, the IP will not be masked in the logs. Changing the ACL rule to a named ACL also fails to deny a request and log a masked IP:
acl line isn’t evaluated until the deny line is hit, after the source IP has been changed, giving you the same result. Long story short,
http-request set-src works well unless you need to take some action that relies on knowing the original value of the IP address. In those scenarios, to successfully mask the IP in the logs and perform another action, your best bet is to store the masked IP in a variable. Then, use a custom log format to record the masked IP while the rest of your configuration can use the full address.
In the following example, the masked IP is stored in a variable called txn.src_masked and that variable is recorded in the log by using a custom
log-format, as shown:
Here, the log will show the masked IP, but the rest of the HAProxy configuration can operate on the original address, such as to deny specific IPs. Something to be aware of is that an upstream web server can still get the true IP address, such as when you send it in an X-Forwarded-For header by using the
option forwardfor directive. The IP will only be masked within the HAProxy logs. However, you can set X-Forwarded-For with
http-request set-header instead of using
option forwardfor and use the src_masked variable as its value.
Masking IPv6 Addresses
You can mask IPv6 addresses as well. An IPv6 address is made up of 128 bits, separated into eight 16-bit groups. The first 64 bits of the address specify the network. The last 64 specify the computer to which the address is assigned. In the following example, the frontend listens on all IPv4 and IPv6 addresses assigned to the HAProxy load balancer. The
ipmask converter shows the first 24 bits of an IPv4 address and the first 64 bits of an IPv6 address.
My virtual machine’s link-local IPv6 address is fe80::a00:27ff:fedd:dd19. When I go to http://[fe80::a00:27ff:fedd:dd19] in my browser now, the log only records the network portion of the address, fe80::.
In this scenario, both IPv4 and IPv6 addresses will be masked, depending on which one the client uses. Note that an IPv6 parameter must follow an IPv4 parameter. You cannot specify an IPv6 mask (e.g. 64) alone.
In this post, we demonstrated the importance of considering how your HAProxy load balancer can protect the privacy of your users’ data, helping to keep you compliant with laws like the GDPR. Note that I am not a lawyer and you should consult with one to ensure that you are truly compliant with the law.
HAProxy gives you the building blocks to guard your infrastructure against a wide variety of threats and masking IP addresses is simply one more feature. Use the
ipmask converter to mask portions of an IP address which might otherwise be used to identify a person.
If you enjoyed this post and want to see more like it, subscribe to our blog! You can also follow us on Twitter and join the conversation on Slack. Want to learn more about geolocation and device detection available in HAProxy Enterprise? Contact us and sign up for a free trial today.