HAProxy Traffic Mirroring for Real-World Testing

Traffic mirroring makes it possible to stream production traffic to a test or staging environment. Use the HAProxy Traffic Shadowing agent to enable mirroring.

The HAProxy Stream Processing Offload Engine (SPOE) lets you stream data to an external agent in real time where it can be processed by a programming language of your choice, including C, .NET Core, Go, Lua and Python. This opens the door to extending HAProxy in many ways. We described the architecture of the SPOE in our blog post Extending HAProxy with the Stream Processing Offload Engine.

As part of the HAProxy 2.0 release, a new agent was introduced that uses the SPOE to mirror traffic to another environment. The Traffic Shadowing agent captures traffic, or a percentage of it, and sends a copy to another URL. You’d use it to send production traffic to a QA testing environment in order to validate a new version of a feature before it’s made public. That way, you reduce the risk of discovering a bug only after the feature is released.

In this blog post, we’ll demonstrate how to use mirroring to send samples of production traffic to your QA environment.

Real-World Traffic Without the Real-World Impact

Imitating real users is hard. Real users stress an application in ways that are difficult to reproduce artificially. For example, two users may perform unrelated tasks in different parts of the application simultaneously, which then triggers an unknown bug caused by the confluence of their actions. Race conditions, deadlocks, and other threading problems often surface under a realistic load.

Traffic mirroring, or traffic shadowing, is a technique in which live, production traffic is copied and sent to two places: the original production servers and a staging or test environment. That test environment may be segregated into a separate network that is not publicly accessible. As long as the requested URLs and parameters match the new version of the feature being tested, then it’s easy to validate that the new version is as close to bug-free as possible.

The value of traffic mirroring lies in that you can do this without impacting your users. Mirroring traffic using the Traffic Shadowing daemon is fire and forget. When requests are copied and sent to the test environment, it has almost no impact on the time needed to process the request. The client does not need to wait for a response from the test server. You can also configure the daemon to only capture a portion of the traffic so that your test environment doesn’t need to maintain the infrastructure that’s necessary to handle production-level amounts of requests.

Setting up Traffic Mirroring

Clone or download the source code repository and follow the instructions for building it. For example, on Ubuntu Bionic, you would build and install it like this:

	$ sudo apt update
	$ sudo apt install -y autoconf automake build-essential git libcurl4-openssl-dev libev-dev libpthread-stubs0-dev pkg-config
	$ git clone https://github.com/haproxytech/spoa-mirror
	$ cd spoa-mirror
	$ ./scripts/bootstrap
	$ ./configure
	$ make all
	$ sudo cp ./src/spoa-mirror /usr/local/bin/

view raw blog20190723-01.sh hosted with ❤ by GitHub

Add the --enable-debug flag when you call configure if you want to see more verbose output from the agent, like this:

./configure --enable-debug

view raw blog20190723-02.sh hosted with ❤ by GitHub

After you’ve followed these steps, the spoa-agent program will be available on your PATH. Start it by passing it the --runtime argument for how long it should run and then exit (e.g. –runtime 1h for one hour or 0 for unlimited time) and the --mirror-url argument to set the URL where you want to send the mirrored traffic.

$ spoa-mirror --runtime 0 --mirror-url http://test.local --logfile /var/log/haproxy-mirror.log

view raw blog20190723-03.sh hosted with ❤ by GitHub

The agent listens on all IP addresses at port 12345 by default. You can change these settings with the --address and --port arguments. You can also pass it the --daemonize argument to run the program in the background.

Starting with HAProxy version 2.0, you can have HAProxy manage the lifetime of the agent. Use the HAProxy Process Manager to control starting the daemon when HAProxy starts. Add a program section that contains a command directive to your HAProxy configuration, as shown:

	program mirror
	command spoa-mirror --runtime 0 --mirror-url http://test.local

view raw blog20190723-04.cfg hosted with ❤ by GitHub

When you start HAProxy, you’ll see that the spoa-mirror agent is running alongside it:

	$ sudo systemctl status haproxy

	haproxy.service - HAProxy Load Balancer
	Main PID: 1177 (haproxy)
	Tasks: 14 (limit: 1152)
	CGroup: /system.slice/haproxy.service
	├─1177 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock -sf 1209
	├─2081 spoa-mirror --runtime 0 --mirror-url http://localhost:81 --address 127.0.0.1
	└─2082 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock -sf 1209

view raw blog20190723-05.sh hosted with ❤ by GitHub

Now the daemon is set up to receive requests and forward it to the test server.

Did you know?

If you’ve enabled debugging, you can send the agent’s log messages to a file by adding the --logfile flag to the spoa-mirror command. Prefix the filename with either w: to overwrite the file if it exists or a: to append to the file, such as –logfile a:/var/log/spoa-mirror.log.

The next step is to configure HAProxy to send traffic to the agent. Add a filter spoe directive to your frontend that references a file named mirror.conf, as shown:

	# Production frontend
	frontend fe_main
	mode http
	bind :80
	option http-buffer-request
	filter spoe engine mirror config /etc/haproxy/mirror.conf
	default_backend servers

view raw blog20190723-06.cfg hosted with ❤ by GitHub

We will cover what goes into mirror.conf in the next section.

Next, in addition to the backend that holds your production servers, add a backend that contains the address of the spoa-mirror agent. Here’s an example:

	# Production servers
	backend be_servers
	mode http
	server s1 prodserver:80

	# Mirror agents
	backend mirroragents
	mode tcp
	balance roundrobin
	timeout connect 5s
	timeout server 5s
	server agent1 localhost:12345

view raw blog20190723-07.cfg hosted with ❤ by GitHub

In this example, production traffic is received at port 80. It is then sent to the servers backend like normal, but it’s also mirrored to the mirroragents backend, which relays it to the agent listening at localhost:12345. For this to work, you have to set up mirror.conf, which you’ll see in the next section.

The mirror.conf File

The filter spoe directive in your frontend lists a config parameter that points to an SPOE configuration file to use. Create that file now at /etc/haproxy/mirror.conf. An engine parameter sets a label that must match a section in mirror.conf. We’ve arbitrarily named it mirror in this case. It’s only important that they match. Add the following to the file:

	[mirror]
	spoe-agent mirror
	log global
	messages mirror
	use-backend mirroragents
	timeout hello 500ms
	timeout idle 5s
	timeout processing 5s

	spoe-message mirror
	args arg_method=method arg_path=url arg_ver=req.ver arg_hdrs=req.hdrs_bin arg_body=req.body
	event on-frontend-http-request

view raw blog20190723-08.conf hosted with ❤ by GitHub

With this file, you configure how HAProxy will communicate with the agent(s). The file begins with an engine name, mirror, in square brackets. As mentioned, this must match the engine value set on the filter line in the HAProxy configuration.

The log global line means that events, such as when HAProxy sends data, will be logged to the same output defined by the log statement in the global section of the HAProxy configuration. The messages line is a space-delimited list of labels that match up with spoe-message sections. The use-backend line specifies which backend in the HAProxy configuration holds the mirror agents.

You can also set timeouts for various parts of the HAProxy-to-agent communication. The timeout hello setting limits how long HAProxy will wait for an agent to acknowledge a connection. The timeout idle setting limits how long HAProxy will wait for an agent to close an idle connection. The timeout processing setting limits how long an agent is allowed to process an event.

A spoe-message section defines which HAProxy fetch methods will be used to capture data to send to the agents. The label here, mirror, is expected by this particular agent. For traffic mirroring, we capture the following:

the HTTP method
the URL path
the version of HTTP
all HTTP headers
the request body (note that this requires option http-buffer-request in the HAProxy configuration)

Data is sent every time that the on-frontend-http-request event fires, which is before the evaluation of http-request rules on the frontend side. Once you have this file in place, restart HAProxy for it to take effect. You should see requests to the Traffic Shadowing daemon appear in the log at /var/log/haproxy.log:

SPOE: [mirror] <EVENT:on-frontend-http-request> sid=0 st=0 0/1/0/0/1 1/1 0/0 0/1

view raw blog20190723-09.txt hosted with ❤ by GitHub

Tuning the Mirrored Traffic

There are a few ways to tune the traffic that gets mirrored. For one thing, you can add an ACL that limits the requests that get captured. For instance, if you only wanted to mirror traffic for requests to the /search feature on your site, you would ignore all requests except those that have a URL path beginning with /search, as shown:

	spoe-message mirror-msg
	args arg_method=method arg_path=url arg_ver=req.ver arg_hdrs=req.hdrs_bin arg_body=req.body
	event on-frontend-http-request if { path_beg /search }

view raw blog20190723-10.conf hosted with ❤ by GitHub

You can also define named ACLs that do the same thing:

	spoe-message mirror-msg
	args arg_method=method arg_path=url arg_ver=req.ver arg_hdrs=req.hdrs_bin arg_body=req.body
	acl is_search path_beg /search
	event on-frontend-http-request if is_search

view raw blog20190723-11.conf hosted with ❤ by GitHub

Or suppose you didn’t want to capture all traffic, but rather only a portion of it. You would simply add an ACL that collects a random sample of requests. In the next example, we generate a random number between 1 and 100 and only mirror the request if that number is less than or equal to 10:

	spoe-message mirror-msg
	args arg_method=method arg_path=url arg_ver=req.ver arg_hdrs=req.hdrs_bin arg_body=req.body
	acl is_search path_beg /search
	event on-frontend-http-request if { rand(100) le 10 }

view raw blog20190723-12.conf hosted with ❤ by GitHub

Your ACL statements can also check values from map files. For example, you can switch mirroring on or off by using a map file that contains a key-value pair like mirroring on. Then, check the map file from your mirror.conf file like this:

	spoe-message mirror-msg
	args arg_method=method arg_path=url arg_ver=req.ver arg_hdrs=req.hdrs_bin arg_body=req.body
	acl mirroring_on str(mirroring),map(/etc/haproxy/mirroring.map) -m str on
	event on-frontend-http-request if mirroring_on

view raw blog20190723-13.conf hosted with ❤ by GitHub

Use the HAProxy Runtime API to change the value in the map file to off.

	# Change mirroring to off
	$ echo "set map /etc/haproxy/mirroring.map mirroring off" \| nc 127.0.0.1 9999

	# Show current value
	$ echo "show map /etc/haproxy/mirroring.map mirroring" \| nc 127.0.0.1 9999

view raw blog20190723-14.sh hosted with ❤ by GitHub

You can also use the Data Plane API to add or remove filter spoe lines from the HAProxy configuration file dynamically. In the following example, we show the existing filters, then add a new one, and then remove it:

	# Show existing filters
	curl -X GET --user admin:mypassword "http://localhost:5555/v1/services/haproxy/configuration/filters?parent_name=fe_main&parent_type=frontend"

	# Add a filter line
	curl -X POST --user admin:mypassword "http://localhost:5555/v1/services/haproxy/configuration/filters?parent_name=fe_main&parent_type=frontend&version=1" -H "Content-Type: application/json" -d '{"id": 0, "spoe_config":"/etc/haproxy/spoa.conf", "spoe_engine":"mirror", "type": "spoe"}'
	{"id":0,"spoe_config":"/etc/haproxy/spoa.conf","spoe_engine":"mirror","type":"spoe"}

	# Remove a filter line
	curl -X DELETE --user admin:mypassword "http://localhost:5555/v1/services/haproxy/configuration/filters/0?parent_name=fe_main&parent_type=frontend&version=2" -H "Content-Type: application/json"

view raw blog20190723-15.sh hosted with ❤ by GitHub

Use the Data Plane API to fully configure your load balancer using REST API commands.

Tips for Making the Most of Traffic Mirroring

I’ll leave you with a few ways to get the most out of traffic mirroring:

Set up monitoring and compare the errors you get from your production servers with those you get from the new version to which you’re mirroring traffic. Having a monitoring strategy in place will be key to validating a release.
Make use of HAProxy’s built-in metrics, which you can consume via the HAProxy Stats page or the Prometheus module, to see whether the new version of your feature performs better or worse.
Make sure that the feature you’re testing has URL paths and parameters that match the existing feature so that it is forward compatible with mirrored traffic. Forward compatibility may be a valuable test in and of itself.

Conclusion

In this blog post, you got a tour of the new Traffic Shadowing daemon, which uses the Stream Processing Offload Engine to capture live traffic and mirror it to a secondary URL. This is especially useful for vetting new versions of features before they’re released to the public. The best thing about it is your production traffic won’t be impacted by the mirroring. It’s a fire-and-forget process where the downstream client doesn’t need to wait for the mirroring agent to respond.

If you enjoyed this article and want to keep up to date on similar topics, subscribe to this blog! You can also follow us on Twitter and join the conversation on Slack.

HAProxy Enterprise includes a robust and cutting-edge codebase, an enterprise suite of add-ons, expert support, and professional services. Want to learn more? Contact us today and sign up for a free trial.

We’re hiring! Check out our careers page for more info.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.