Let’s start at the beginning, we work closely with a lot of Secure Web Gateway (SWG) vendors. Mainly because a long time ago we figured out a nice solution to the key requirement of scaling web filters:
All Secure Web Gateways require client IP transparency in order to implement any proper kind of security and authentication. Which can be a real pain if you have a load balancer in the middle.
One of my colleagues, Neil Hosking, has written an excellent blog describing how we usually solve the transparency issue for all the major SWG vendors — whatever deployment mode they choose.
Anyway, back to the story…
The support call seemed simple enough
A Managed Service Provider customer of ours with multiple VDI deployments had been happily using our load balancers for a large clustered web filter for several years. Then after a bunch of network changes they started getting strange behaviour. The load balancing had become uneven, and sometimes servers would keep taking traffic — even when they had been put in maintenance mode.
My first educated guess was they’d miss-configured a server.
They were load balancing the SWGs in explicit mode with Direct Server Return. Did I mention that we love DSR mode? And the most common problem we get when in DSR mode is that all the servers need to be configured correctly for the ARP problem.
But after carefully checking and testing all the settings we couldn't find anything wrong. Everything appeared to be working 90% of the time. Don’t you love diagnosing problems like that?
OK, So how can I isolate what looks like a routing problem?
I was clutching at straws a little bit when I suggested that we configure a separate test cluster using a simple Layer 7 HAProxy configuration. Which obviously wouldn't be source IP transparent but would at least help us figure out if this was a switch/routing problem.
Annoyingly this worked fine, a negative test would have been more useful for diagnosing the network routes. But obviously, this workaround was not source IP transparent (which is the key requirement remember?).
It was around this time that I had another idea...
Hang on, the Diladele SWG is based on Squid, isn't it?
During the conversation, the customer mentioned how much they liked the SWG they were using because it was based on open source technology. Diladele is a web proxy meaning it can be used to filter undesirable content away according to rules defined by a system administrator. It's extremely flexible and based on Squid.
This gives us another possible solution....
As Diladele is, like us, built on open source technology (squid proxy, clamav anti virus, apache web server, etc) it’s possible to make use of ‘proxy protocol’ which, if the reverse proxy sends it, means you can see the connecting machine’s IP address directly in the Diladele interface!
Here is where we can observe the problem we need to solve. In the Diladele web interface navigate to “Traffic Monitor” → “REAL TIME”, you’ll see the IP address of the load balancer, not that of the connecting client machine.
Adding proxy protocol to overcome this limitation:
- You need to tell Diladele to expect the proxy protocol header, when you do this you can no longer connect without using proxy protocol.
a) Navigate to “Squid Proxy” → “SETTINGS”.
b) Find the section that mentions “PROXY protocol” (it might change slightly from this screenshot), enable this.
c) Input the IP addresses of each load balancer separated by a space in the “trusted load balancers” box.
d) Save Changes
e) Apply New Settings and Restart the ICAP server.
f) All being well, you should have a successful restart.
2. Head over to the Loadbalancer.org WebUI. Navigate to “Cluster Configuration” → “Layer 7 - Virtual Services”.
3. Click “Modify” next to the WebSafety virtual service.
4. Scroll down to “Other” and expand the “[Advanced]” section.
5. Select the “Send Proxy Protocol” option and set this to “Send Proxy V1”
6. Click “Update”
7. Click “Reload HAProxy”
Once that is done, the next requests that come through will expose the client IP addresses:
Wow, that's another great technique to balance explicit web filters with the required level of transparency :-).
Hang on... why on earth did a working system suddenly break?
Well, as is often the case, it was nothing to do with the load balancer. Several hours after finding this workaround for the customer they came back to us saying they had discovered the problem. One of the switches on the network had gone haywire after a security update. It was randomly corrupting its ARP cache among other issues. After re-flashing the switch the original cluster settings and servers started working perfectly in DSR mode.
The customer decided to stick with DSR mode (because it's very, very fast), but was impressed with the workaround option!