Routing HTTP requests
Content switching is the ability to route HTTP requests based on any information available in the HTTP protocol
(URL and headers).
This document explains how to do it in HAProxy
v4.2 and later
ALOHA load balancer
Content Switching helps improving application scalability and makes web architectures more flexible.
It can be used as well to improve security and reliability of web platforms or when some applications require different settings despite being hosted on the same server.
These settings could be:
- virtual hosting: routing requests based on website host name
- application hosting: routing requests based on url path
- categorizing HTTP traffic, like static and dynamic
- resource allocation based on user category (authenticated or not)
- health check
- connection mode
- queue management
Non exhaustive list.
Content switching rules are partialy compatible with HAProxy’s tunnel mode. Please read the memo named “HTTP connection mode” to know more about HAProxy’s modes.
In tunnel mode, only the first request of the session can be routed, upcoming data of the connection is forwarded as payload in the tunnel established by HAProxy between the client and the server.
ALOHA 4.2 to 5.5
By default, the ALOHA is configured in the server-close mode, which is compatible with the information provided in this document.
If you need to enable tunnel mode on this ALOHA version, then you’ll be able to route the whole traffic based on the first request of each connection. This can be sufficient in most cases.
ALOHA 6.0 and above
Since the ALOHA 6.0, a new connection mode has been introduced: http-keep-alive. This mode is compatible with NTLM since it keeps the connection opened on the server side, but it is also able to analyse the whole content exchanged on this connection. It means it is able to route any request from an established connection to an other server when required.
The tunnel mode is still available, called http-tunnel, and could be used in very rare cases.
Usually, the Content Switching ability is associated to Reverse-Proxies.
HAProxy’s way of working is typically a Reverse proxy. In your configuration, you define entry points, called frontend, and outgoing points, called backend. The way to choose which backend to route an HTTP request is content switching.
Content Switching in the ALOHA Load-Balancer
HAProxy’s internal routing process
In the ALOHA, the software responsible to processing HTTP is HAProxy. So this is in HAProxy that we’ll be able to configure content switching rules.
From an HTTP point of view, HAProxy is split into two main components:
- frontend: manages everything related to the client side
- backend: manages everything related to the server side
Basically, when a client sends a request, it is first processed by the frontend. Then, based on its rules, it routes the request to a backend.
Routing decision can be taken on the following items:
- any string or regexp matching in HTTP headers
- any string or regexp matching in the URL (including scheme and protocol version)
- any value of a query string parameter
- any file extension
- cookie values
- ssl protocol
- HAProxy’s internal status (farm capacity, etc…)
Content switching rule profile
In HAProxy, Content switching rules are split in 2 components:
- an acl to fetch samples from current traffic and to match it against patterns
- a routing decision which points to a backend if the associated acl(s) are true
Below is the prototype of a rule:
acl <acl name> <fetch> <patterns> use_backend <backend name> if / unless <acl name>
- acl: HAProxy keyword pointing the beginning of a new matching rule
- <acl name>: a word (underscore ’_’ and hyphen ’-’ allowed) naming the acl which will be used as a label (or pointer) later in the configuration
- <fetch>: sample to be extracted
- <patterns>: values to be compared to the sample fetched
- use_backend: HAProxy keyword indicating that a routing decision may occur if the acl matches
- <backend name>: the name of the backend to route to
- if / unless: keyword to tell whether to match (or not) an acl
- <acl name>: the name of the acl to get the matching result from
Rules about HAProxy’s content switching
When working with acl and use_backed in HAProxy, it is important to keep in mind the following rules:
- Many patterns can be provided on a single acl line, a logical OR is applied between them
- Many acls can have the same name: a logical OR is applied between them
- An acl returns only TRUE or FALSE when the <sample fetched> matches any <pattern>
- When a frontend configuration owns many use_backend, the first one matching an acl will be used
- A use_backend can be triggered by many acls. In such case, just append acl names. A logical AND is implicitly applied. An explicit OR is also available.
There are many fetch methods available in HAProxy, and each new ALOHA release comes with new ones.
To know which fetches are available in your ALOHA, open the WUI, go in the LB Layer7 tab, click on the help button on the top right corner of the textarea and use the search engine and look for the string “Matching at Layer 7” or “Fetching HTTP samples” (ALOHA 6.0 and above).
ALOHA 4.2 to 5.5
The most common fetches are:
- src: fetch source IP address
- nbsrv: number of available servers in a farm
- method: request HTTP method
- hdr: fetch a header value
- path: fetch a url path (query string excluded)
- url: fetch the request’s URL as presented in the request, including the method (GET/POST/- HEAD/etc…) and the protocol version
- urlp: fetch the first occurrence of a URL parameter in a query string
This firmware release introduced the new following sample fetch:
- base: fetch the concatenation of the first Host header and the path of the url (until the question mark)
- cook: fetch a cookie value
(Non exhaustive list)
By default, any fetch method applies to the whole targetted object. Sometimes, we just want to fetch samples at different locations or through different ways in the object.
You can suffix the fetch method by the one of the keyword below to change the location:
- _beg: prefix
- _cnt: number of occurrence
- _dir: directory (slashes are implicit, no need to declare them)
- _dom: domain name
- _end: suffix
- _len: length
- _reg: PCRE regex
- _sub: substring
- hdr_sub: fetch a substring in a header
- hdr_reg: fetch a regular expression in a header
- path_beg: fetch the beginning of the url path
- path_end: fetch file extension (basically, the end of the url)
- path_dir: fetch a directory in the path
Sample / pattern types
There are many ways to match pattern against a fetched sample, depending on the type of the sample.
When a <fetch> returns an integer, we may want to know if the number returned is lower, greater or equal than a pattern.
The keyword below allows these comparisons:
- eq: true if the sample fetched is equal to the pattern
- ge: true if the sample fetched is greater OR equal than the pattern
- gt: true if the sample fetched is greater than the pattern
- le: true if the sample fetched is lower OR equal than the pattern
- lt: true if the sample fetched is lower than the pattern
In example: test if the number of available servers in the backend bk_web is lower than 2:
acl low_capacity nbsrv(bk_web) lt 2
The acl above returns true when the number of available servers in a farm is LOWER than 2 (thus 1 or 0).
It can be used to route traffic to a sorry backend if you know your application requires at least 2 servers to handle the load properly
When a <fetch> returns a string, we may want to compare with other strings.
- A raw case-sensitive comparison is performed
- It is possible to make the search case-insensitive by adding the flag “-i” before the patterns
- If a pattern contains a space character, then it must be escaped by a backslash (’\’)
In example: check if the Host header is ’www.domain.tld’:
acl host_www.domain.tld hdr(Host) www.domain.tld
Regular expressions can be used to match any type of content.
The comparison is case-sensitive by default. It can be turned case-insensitive by adding the flag “-i” before the regexp.
If a regexp contains a space character, then it must be escaped by a backslash (’\’).
In example: check if the Host header looks like ’*.domain.tld’:
acl host_domain.tld hdr_reg(Host) .*domain\.tld$
IPv4 and IPv6 addresses
ALOHA 5.5 and above can match both IPv4 and IPv6
IPs addresses can be provided either as a single host address or as a subnet in CIDR notation. The result is positive if the sample matches or if it belongs to the subnet (patterns).
In example: check if the client IP belong to the users subnet:
acl users_subnet src 10.0.0.0/24
The acl returns true if the client IP belongs to the supplied subnet.
Content switching examples
Static and dynamic traffic split
When an application is hosted over a single host name, one way to split requests for static and dynamic content is to use either URL path or file extension.
The diagram below illustrates this case. The ALOHA gets all traffic and splits it against 2 farms.
Note that a single server could be used to deliver both static and dynamic content.
Splitting the traffic that way allows to manage each farm individually with different type of queueing and health checks.
Whole website traffic reaches this frontend. HAProxy takes rooting decision based on layer 7 information.
frontend ft_websites mode http bind 0.0.0.0:80 log global option httplog # detect static content by file extension or URL path acl static path_end .gif .jpg .jpeg .png acl static path_end css js acl static path_beg /images/ /users/ acl static path_dir static use_backend bk_static if static # default route default_backend bk_dynamic
# static farm configuration backend bk_static mode http balance roundrobin option forwardfor # dedicated health check for static content option httpchk HEAD /images/pixel.png default-server inter 3s rise 2 fall 3 slowstart 0 server srv1 192.168.10.11:80 weight 10 maxconn 1000 check server srv2 192.168.10.12:80 weight 10 maxconn 1000 check # dynamic farm configuration backend bk_dynamic mode http balance roundrobin cookie SERVERID2 insert indirect nocache option forwardfor # dedicated health check for dynamic content option httpchk GET /check.php http-check expect string OK default-server inter 3s rise 2 fall 3 slowstart 0 server srv1 192.168.10.11:8080 cookie s1 weight 10 maxconn 100 check server srv2 192.168.10.12:8080 cookie s2 weight 10 maxconn 100 check
It is very common to use Virtual Hosting based on website host name when we own a very few public IP addresses and need to give a access to a very large number of applications.
Basically, the ALOHA load-balancer is used as a reverse proxy in such case.
In the example below, there are 2 domains pointing to the ALOHA public IP address. Depending on the domain name, the ALOHA Load-Balancer decides which farm to use.
Whole website traffic reaches this frontend. HAProxy takes rooting decision based on layer 7 information.
frontend ft_websites mode http bind 0.0.0.0:80 log global option httplog # Capturing Host header in logs is important for # troubleshooting to know whether rules matches or not capture request header host len 64 # site1 routing rules based on Host header acl site1 hdr_end(host) site1.com site1.eu use_backend bk_site1 if site1 # site2 routing rules based on Host header acl site2 hdr_reg(host) site2\.(com|ie)$ use_backend bk_site2 if site2 # default route default_backend bk_default
Each virtual host has its own backend with its specific configuration (mind the health check).
# site1 backend configuration backend bk_site1 mode http balance roundrobin cookie SERVERID1 insert indirect nocache option forwardfor # dedicated health check for site1 option httpchk HEAD /check.php HTTP/1.1\r\nHost:\ www.site1.com default-server inter 3s rise 2 fall 3 slowstart 0 server srv1 192.168.10.11:80 cookie s1 weight 10 maxconn 1000 check server srv2 192.168.10.12:80 cookie s2 weight 10 maxconn 1000 check # site2 backend configuration backend bk_site2 mode http balance roundrobin cookie SERVERID2 insert indirect nocache option forwardfor # dedicated health check for site2 option httpchk HEAD /check.jsp HTTP/1.1\r\nHost:\ www.site2.com default-server inter 3s rise 2 fall 3 slowstart 0 server srv1 192.168.10.13:80 cookie s1 weight 10 maxconn 1000 check server srv2 192.168.10.14:80 cookie s2 weight 10 maxconn 1000 check
All requests which did not match the content switching rules are routed to this backend:
backend bk_default mode http balance roundrobin option forwardfor option httpchk HEAD / default-server inter 3s rise 2 fall 3 slowstart 0 server srv1 192.168.10.8:80 weight 10 maxconn 1000 check server srv2 192.168.10.9:80 weight 10 maxconn 1000 check