HAProxy integrates with AWS X-Ray to give you the best observability across your Amazon Web Services (AWS) resources, including your load balancer. Read on to learn how.
There is a trend to move away from monolithic applications towards microservices. Microservices run within separate processes, often on different machines where interservice communication happens over a network. This makes debugging more challenging. In the past, a single, process-level stack trace would have told you everything you needed to know. Now, to follow the path of a request across a network of services, the approach must be different.
Distributed tracing lets you profile applications that are distributed across processes and machines. At each stage, the request is tagged with a consistent identifier that allows you to map out and sequence the full journey. You can also see latency at and between each service, enabling you to zero in on problem areas.
HAProxy is somewhat legendary for its level of observability. If you haven’t already, read our blog post, Introduction to HAProxy Logging. It explains how the HAProxy logs provide detailed insights about each request and response that comes through, including giving you a termination code that describes how each session ended (e.g. was the request intentionally blocked or was there a server error while getting the response?). HAProxy also has a built-in dashboard that gives you a near real-time feed of information about your proxied services. You can read more about it in our blog post, Exploring the HAProxy Stats Page.
In the post, you’ll see how HAProxy integrates with AWS X-Ray. AWS X-Ray is a hosted distributed tracing tool built into the AWS cloud services platform. It lets you track the path of a request through each service-to-service hop. The time spent at each step is recorded and displayed in a dashboard. It also provides a map showing how your services are connected.
Daniel Corbett, Director of Product at HAProxy Technologies, gave a presentation about this topic at AWS re:Invent. Check it out here:
Tracing with AWS X-Ray
Here’s how AWS X-Ray works: When a request passes through instrumented code or infrastructure, data is captured and forwarded to the X-Ray service, which is hosted on AWS. This data is in the form of segments. As found in the AWS X-Ray API documentation, segments are JSON representations of requests that a service has processed. They record information about the host, the original request, time spent doing work, and any faults and exceptions encountered. They may also include subsegments with information about calls that the application makes to upstream AWS resources, HTTP APIs, and databases that aren’t natively integrated with X-Ray.
By virtue of a unique identifier that’s attached to the request as it passes through each component, segments can be assembled into a trace. When viewed in the console, traces are visualized as charts that show time spent at each service.
X-Ray also provides a service graph that gives you an overview of how requests are flowing through your system. You can then answer questions such as: Where is latency happening? Which services are producing errors? How much is a service being utilized?
Many times, people leverage HAProxy as their edge proxy on AWS because of the additional features and flexibility that it offers. When you use HAProxy on AWS, you can add the fine-grained metrics you get with HAProxy to the high-level visibility you get with X-Ray.
Deploying the AWS X-Ray Sample Project
To get up and running fast, you can use the sample project provided by AWS. Log into the AWS console and go to the AWS X-Ray Getting Started tab. From there, you can launch a premade Node.js application that comes equipped to send X-Ray segments. Follow these steps:
- Choose to launch a sample application (Node.js).
- Follow the given instructions to deploy the application via CloudFormation and Elastic Beanstalk.
- When asked to select a VPC and subnet, choose one of the premade defaults from the lists. There’s quite a bit to set up if you decide to create your own VPC and subnet (Internet gateways, route tables, etc.), so selecting an existing VPC and subnet will be quicker.
After CloudFormation finishes building the app, look for the ElasticBeanstalkEnvironmentURL key in the Outputs tab on the CloudFormation Stacks screen.
Enter that URL into your browser to view the sample application.
After you’ve completed this tutorial and you’re ready to remove the AWS resources, you can delete the entire application by using the CloudFormation dashboard.
Getting X-Ray Segments from HAProxy
The sample project is available on Github. It uses Docker Compose to spin up a small environment on your local workstation. It creates a Docker container running HAProxy and another container running the X-Ray daemon and rsyslog.
The load balancer proxies requests to an echo service by default, which works for sending data to the X-Ray service, but you can change the server address in the HAProxy configuration to point to the application you deployed using CloudFormation in the previous section to get a more realistic trace. HAProxy logs requests to rsyslog, which formats the data into JSON segments and sends them to the X-Ray daemon. The daemon then forwards the segments to the X-Ray service in AWS.
For the rest of this post, we’ll dig into the mechanics of how the individual pieces work together. Then, you’ll see the final result, which includes a complete trace of each request as it passes through the system.
Installing the X-Ray Daemon
The X-Ray daemon is software that listens on UDP port 2000 and forwards segment data to the X-Ray service hosted on AWS. The sample project installs it into a Docker container, but you can download and install it into the environment of your choice. It’s available as both an executable and as an RPM or Debian package.
You configure the X-Ray daemon by editing /etc/amazon/xray/cfg.yaml. You should only need to fill in the
Region, which the sample projects sets to us-east-2 (Ohio), and set
LocalMode to true if you aren’t running the daemon on EC2. Everything else can be left as is. You may also want to set a more verbose
LogLevel (e.g. dev) until you’ve ironed out any issues.
LogPathto save the daemon’s logs somewhere other than the default of /var/log/xray/xray.log, you’ll also need to edit its Systemd unit file, /lib/systemd/system/xray.service, in which the same path is hardcoded via the
Your file will look like this:
Next, allow the daemon to send data to the X-Ray service by creating user credentials that include the AWSXRayDaemonWriteAccess permission. If you’ve installed X-Ray as an RPM or Debian package, then a user named xray will have been created for you locally. So, save the access key and secret key to a file at /home/xray/.aws/credentials. Or, if the X-Ray daemon is running as another user, such as root, copy the credentials file to that user’s .aws directory. It will look like this:
Afterwards, reload the X-Ray service:
Note that the sample projects runs the daemon in a container and uses a bash script to set up and initialize it. The daemon logs to a file at /var/log/xray/xray.log. You should see something like this (with more or less verbosity, depending on your configuration) telling you that the service has started successfully:
In the next section, you’ll see how to configure rsyslog to ingest metrics from HAProxy and relay them to the X-Ray daemon, which in turn will send them to the hosted X-Ray service.
Setting up Rsyslog
HAProxy is designed to send its logs to a Syslog-compatible log processor, like rsyslog. The nice thing about rsyslog is that it can process each log message and convert it into X-Ray compatible JSON.
In our HAProxy configuration, we define a custom log format that delimits each field by a pipe character. We then use the Fields Extraction Module in rsyslog to split the HAProxy log line up by that designated character and then place the resulting values into a JSON template. Then, we use the Forwarding Output Module to send the generated JSON to the X-Ray daemon.
To set this up, we modify the HAProxy rsyslog configuration, which may already exist as a file named /etc/rsyslog.d/49-haproxy.conf. Here’s how it should look when everything is said and done, with some parts removed for brevity:
Restart the rsyslog service for the changes to take effect. When a request comes through HAProxy and gets logged to rsyslog, JSON is produced that looks like this:
The HAProxy Configuration
Setting things up on the HAProxy side is mostly a series of manipulating HTTP headers and configuring a custom log format. You have two primary tasks:
- Send the HAProxy segment data, via rsyslog, to the X-Ray daemon so that the load balancer is included in the trace.
- Attach a unique identifier to each request as it passes through so that upstream services can attach the same ID to their segment data.
The first task is to send the HAProxy segment data to rsyslog. In this case, since we’re proxying HTTP traffic, we include the fields required by the X-Ray service—name, id, trace_id, start_time and end_time—and optional fields pertaining to HTTP traffic—method, status code, etc. We also include HAProxy-specific metrics such as connection counters, termination indicators, and timing values.
The second task is to add a unique identifier to the request as an HTTP header called X-Amzn-Trace-Id. This is typically done by the first component in the trace. HAProxy will likely be the first, so adding the header makes sense. Per the X-Ray documentation, a trace ID consists of three numbers separated by hyphens. Here’s an example header:
The first number is the version (set to 1). The second is the start time of the request as an eight-character, hexadecimal, Unix-epoch time. The third is a 96-bit, globally-unique identifier for the trace, represented as 24 hexadecimal digits.
Here’s how your HAProxy configuration will look:
Let’s break this down to better understand what it does. The following
http-request set-header lines store the scheme (http or https), trace ID, and segment ID in custom HTTP request headers. Request headers are a convenient place to store data.
The trace ID is the unique identifier for the entire trace. The segment ID distinguishes the current segment from others. Both are populated using the same combination of sample fetches and converters:
randreturns an integer between 0 and 2^32
hexconverts it to a hex string
bytesextracts eight digits from this hex string
We do that three times for X-TraceId to get a 96-bit identifier as 24 hexadecimal digits. We only do that two times for the X-SegmentId headers. Note that we’re creating several segment IDs because we’re going to send subsegments that contain additional timings: the client request time, queued time, server connect time, and server response time.
declare capture and
http-request capture lines create capture slots, or places where you can store data that you wish to log, and fill them with the values of the request headers. Note that we set
id parameters for each, which is how the fields can be referenced later in
Next, we set the X-Amz-Trace-Id request header. The
unique-id-format directive configures the format of the ID and
unique-id generates a new one. The
http-request set-header that follows it inserts the identifier into the X-Amzn-Trace-Id header as the Root field.
Then, we configure the format of the log messages, which will be ingested by rsyslog. We’ve set the delimiter to be a pipe symbol.
Captured log lines look like this, before being converted to JSON:a
%ID returns the unique identifier used as the trace ID. If you like, you can remove the temporary headers that were set before forwarding the request to the backend. Use the http-request del-header directive:
backend, you should have a
server that points to the public IP address of your proxied service, the value of ElasticBeanstalkEnvironmentURL, from the sample X-Ray application that you set up at the beginning of this tutorial.
The Final Result
Once everything is installed and configured, start sending requests to your HAProxy frontend. Then, head over to the AWS console and see the X-Ray service map, noting that it includes HAProxy! Traces will show load balancer metrics including server response time, time queued, server connect time, and client request time.
In this blog post, you learned how the AWS X-Ray service allows you to create distributed traces that help with troubleshooting, assessing service utilization, and zeroing in on latency. Oftentimes, the HAProxy load balancer is used in AWS because of its advanced features and flexibility. With this solution, you’re able to integrate HAProxy with X-Ray, giving you a richer state of observability.
Want to stay in the loop on similar topics? Subscribe to this blog! You can also follow us on Twitter and join the conversation on Slack. Want to learn more about HAProxy Enterprise? It provides the utmost performance, reliability, and security at any scale and in any environment. It comes with the Real-time Dashboard, which provides added observability across your HAProxy cluster. Request a free trial today or contact us to learn more!