How to Extract Insightful Data From Proxy Protocol Packets

Boosting the transparency of your load balancer traffic is advantageous. Web applications continually pass information back and forth, yet some of this important data is often hard to get during transit. And while the perceived “black box” nature of networking seems overwhelming, what if you could peek behind the curtain to better understand your traffic?

This starts with preserving crucial data such as the client’s Source IP. Clients often use the PROXY protocol to transmit the original client IP address but also to embed additional information in the request. In fact, many applications embed additional headers like PP2_TYPE_ALPN, among others. We developed this protocol in-house at HAProxy, and it's now widely supported by modern web infrastructure. However, there’s some minor work involved in extracting and interpreting this header information. That's where popular external tools like TShark can help.

In this guide, we’ll first explain why connection data matters. Next, you’ll learn how the PROXY protocol works, how to use the TShark analyzer to capture and inspect packets and view the extraction results.

What makes traffic data so valuable?

Theoretically, each request flowing through a load balancer like HAProxy can contain a client IP address, port information, a destination IP address, and even a virtual private cloud (VPC) subnet ID. These bits of data enable connection validation. For example, parsing the header can help us identify good hosts with legitimate identities. Equally, HAProxy can use that information to flag bad hosts and halt traffic accordingly.

At the highest level, these parameters provide important context behind requests and boost the transparency of client-server communication pathways. It’s easier to understand how traffic is flowing and possibly identify configuration errors.

Finally, data packaged via the PROXY protocol lets us chain multiple layers of NAT or TCP proxies together while keeping the original IP address. The PROXY protocol header, in one instance, enables traffic to successfully pass through subsequent firewalls and proxies.

What is the Proxy Protocol?

The PROXY protocol provides a convenient way to transport connection information safely between client and server. Without a load balancer in the middle, the server would normally be able to directly retrieve this data independently. This includes the following in HAProxy:

An address family, such as AF_INET for IPv4, AF_INET6 for IPv6, and AF_UNIX
Socket protocols for TCP and UDP
Layer 3 source and destination addresses
Any Layer 4 source and destination ports

We designed the PROXY protocol to be quickly parseable. While Version 1 focused on human readability, Version 2 adds binary encoding support to accelerate this process even further. The PROXY protocol is the successor to our XCLIENT protocol.

How the Proxy Protocol works

Primarily, we know that clients use the PROXY protocol to embed connection information. But, how does everything work under the hood?

We've designed the protocol to reduce information processing overhead without requiring users to make sweeping changes to their backend components. Plus, the protocol prevents the typical connection parameter losses that occur when relaying TCP connections through a proxy.

This sidesteps what we call the “dumb proxy” problem, where a load balancer processes protocol-agnostic data without knowing which protocol is transported atop the connection. HAProxy can run in pure TCP mode and fall into this category.

This isn’t a negative aspect in and of itself. However, technical challenges can arise when using the keep-alive directive. Only the first request on a new connection will pass the X-Forwarded-For header or the Forwarded extension, which is a problem on long-lived connections that handle multiple requests. Since the client information within those headers often remains unchanged, sending a header with each request is unnecessary.

Packaging information via the PROXY protocol solves this problem. HAProxy can prepend each connection with a header that reports insightful connection characteristics for the other side. This is easy to implement without protocol-specific knowledge. Plus, we can eliminate any dangers and limitations of caching.

So, we now have this conveniently prepackaged data. How do users unpack it into something usable?

Introducing the TShark packet analyzer

The TShark tool—part of Wireshark’s open source packet-analysis software—lets us quickly and easily capture, read, and print PROXY protocol packets. It writes the PROXY Protocol information it gathers in either a Terminal output or destination file.

TShark also offers the following advantages:

CLI-based Wireshark filter integration
Direct packet capture
Packet capture (PCAP) analysis from a tcpdump

In fact, TShark’s default configuration closely emulates tcpdump’s functionality. The tcpdump tool also ships with most Linux-based operating systems, so many users may already be familiar with it. While HAProxy can interpret and understand the contents of these packets for routing purposes, we can't natively perform a tcpdump.

TShark’s manual states that it grabs data “from the first available network interface and displays a summary line on the standard output for each received packet.” Luckily, TShark is a user-friendly tool. Let’s dive into a technical example that uses TShark to unearth data from PROXY protocol packets.

Set up TShark and test your packet capture

Before combing through your packets, you’ll have to install TShark using the CLI. This is necessary if your Linux distribution doesn’t automatically come with TShark:

For Red Hat Enterprise Linux (RHEL) and RHEL clones, enter the following command:

sudo yum install wireshark

view raw TSharkInstallRHEL.sh hosted with ❤ by GitHub

For Ubuntu and Debian, enter the following:

sudo apt-get install tshark

view raw TSharkInstallUbuntuDebian.sh hosted with ❤ by GitHub

TShark should now be available! However, we recommend testing basic packet capture to ensure that everything is working correctly. Use the following command to test capture for five seconds everywhere except at port 22, to avoid grabbing your own SSH traffic:

tshark -f "tcp port not 22" -a "duration:5"

view raw PacketCaptureTest.sh hosted with ❤ by GitHub

If you want to analyze your output later, you can also specify a destination file using the -w FILENAME argument. Simply enter the tshark -r FILENAME command to read its contents.

Next, the process for solely targeting PROXY protocol packets is a little different. Here’s how to do it.

Finding and unpacking Proxy Protocol packets

Capturing, identifying, and unpacking PROXY protocol packets is easy using Wireshark's display filters. For example, you can filter out only relevant packets, and there are multiple proxy traffic filters.

In most cases, a PROXY protocol packet has an embedded Source IP. You can find it with the following command:

tshark -r FILENAME -Y proxy.src.ipv4

view raw FindPROXYPacketSourceIP.sh hosted with ❤ by GitHub

This will only list packets that contain PROXY protocol information. So, what if you want to view the contents of those packets? Simply add the -V flag, which gives you the following CLI command:

tshark -r FILENAME -Y proxy.src.ipv4 -V

view raw ViewPROXYPacketArgument.sh hosted with ❤ by GitHub

As a result, your Terminal will display something similar to what you see below:

	ubuntu@bk:/etc/hapee-2.6# tshark -Y proxy.src.ipv4 -V
	...
	PROXY Protocol
	Magic: 0d0a0d0a000d0a515549540a
	0010 .... = Version: 2
	.... 0001 = Command: 1
	[Version: 2]
	Address Family Protocol: TCP over IPv4 (0x11)
	0001 .... = Address Family: IPv4 (0x1)
	.... 0001 = Protocol: 0x1
	Length: 12
	Source Address: 192.168.64.1
	Destination Address: 192.168.64.120
	Source Port: 49344
	Destination Port: 8081

view raw ViewPacketContentsOutput.sh hosted with ❤ by GitHub

That’s it! The entire contents of your PROXY protocol packet are now viewable and ready for inspection. You can repeat this same process for other PROXY packets.

Inspecting traffic while using TLS

In most cases, your traffic will use TLS. However, these TLS packets confuse tools like Wireshark—preventing them from correctly identifying PROXY protocol packets. This is especially true for PROXY protocol v2. Version 2 adds a non-parseable binary signature that's designed to cause immediate failures on SSL/TLS (among other protocols) to enforce its use under certain connections.

Luckily, you can use the following command to easily disable TLS protocol detection (for TShark testing purposes) during packet capture and restore PROXY packet detection:

tshark -disable-protocol tls -Y proxy.src.ipv4 -V

view raw DisableTLSDetection.sh hosted with ❤ by GitHub

This produces a similar output to our original command.

Testing new Proxy Protocol requests

While you can now grab important data, you'll sometimes need to generate PROXY protocol requests to test your data capture. You can send these requests with the following cURL utility command:

curl --haproxy-protocol http://<your load balancer>/...

view raw TestPROXYRequests.sh hosted with ❤ by GitHub

Reminder:

You'll need to add :80 accept-proxy to the bind line of your configuration for this to work properly. This enables HAProxy to accept the PROXY protocol (both Version 1 and Version 2).

Running this command will verify that your connection is working, and that nothing is preventing PROXY protocol requests from flowing through HAProxy. Here's an example output using the curl --haproxy-protocol -v http://localhost command:

	* Trying ::1:80...
	* connect to ::1 port 80 failed: Connection refused
	* Trying 127.0.0.1:80...
	* Connected to localhost (127.0.0.1) port 80 (#0)
	> PROXY TCP4 127.0.0.1 127.0.0.1 53606 80
	> GET / HTTP/1.1
	> Host: localhost
	> User-Agent: curl/7.74.0
	> Accept: /
	>
	* Mark bundle as not supporting multiuse
	< HTTP/1.1 302 Found
	< content-length: 0
	< location: /hapee-stats
	< cache-control: no-cache
	<
	* Connection #0 to host localhost left intact

view raw TestProxyRequestsOutput.sh hosted with ❤ by GitHub

Important header information at your fingertips

The PROXY protocol is an incredibly useful mechanism for transporting important connection information through HAProxy. This data is valuable to users who want more insight into their traffic, without having to make major functional concessions.

And while some steps are required to unpack that information, tools like TShark make the process pretty painless. The next time you want to inspect the contents of your PROXY protocol packets, you can do so without jumping through hoops or possessing niche technical knowledge.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.