Why Layer 7 load balancing sucks...

Performance Published on 6 mins Last updated

While load balancing hardware marketing execs get very excited about the fact that their product can magically scale your application by using amazing Layer 7 technology in the load balancer such as cookie inserts and tracking/re-writing. What they fail to mention is that any application that requires the load balancer to keep track of session related information within the communications stream can never be scalable or reliable.

Before we go further I should point out that my opinion may be a minority one:

Willy Tarreau who is practically a savant has written an excellent argument on behalf of Layer 7 Load Balancers.
And Lori MacVittie a much better writer than me written a well informed re-butal to this blog

If your application needs sticky sessions — What happens when your server dies?

"Remind me — Why exactly do sticky sessions help me?"

Lets step back a minute and think about what we are trying to achieve with our load balancing solution:

Are we just looking to increase the performance of our application by adding more application servers?
Or are we trying to achieve true scalability and true horizontal scaling to our application?

I would hope that you are trying to achieve scalability, this will allow you to cope with inevitable changes in demand for the application as well as enable simple maintenance. True scalability of a system enables you to be comfortable in the knowledge that you have a plan when your traffic increases 100 fold. Google is spending a lot of man hours and dollars finding a way to scale their current (fairly large cluster), they are somewhat in a league of their own when they are trying to scale a system by 100 times when they already have an estimated 4 million cores.

And how do I know Google is using Layer 4 load balancing and not Layer 7?

Because they are not stupid, that’s why!
OK, So I don’t 100% know for a fact, but I'm pretty sure Google uses layer 4 load balancing (Maglev) mixed with a lot of BGP stuff for global distribution. Stack loads of read-only sharded and partitioned MySQL replicas, cached by stack loads of sharded and partitioned index servers (memchached maybe) + stack loads of clustered file systems for object storage… probably worth several blogs in itself…

The reasons for Google doing this should become more obvious as I continue rambling....

A lot of talks about load balancing start of by saying that originally people used multiple DNS ‘A’ records to allow round robin access to a bunch of web servers to increase scalability. They will then go on to explain that this was rubbish because it didn’t have health checking, server weighting, feedback agents or cookies. Which is kind of obvious but its not too hard to add health checking to your DNS server. What do you think a GSLB (Global Server Load Balancer) is? A lot of large scale production systems with enormous traffic still use this method because:

  • It’s simple
  • It works

I just realized that technically DNS round robin IS a Layer 7 Load Balancer… oh well that’s marketing for you.

Anyway most people started using little black boxes from CISCO, Alteon etc. that were effectively simple routers. They called them layer 4 routers because as well as doing standard layer 3 router stuff they would also do application health checks such as ping or HTTP GET etc. The nice thing about these little 486 class boxes with 32MB RAM is that they could handle 10’s of thousands of connections without breaking a sweat.

NB. The Loadbalancer.org entry level appliance is a P4 3.4GHz with 8GB RAM... so you can see why we don't bother up-selling you to faster hardware.

However not content with a technology that was so boring because ‘it just worked’ various marketing departments came up with the idea that rather than bothering to make your application scalable you should just slap some sticky tape on the load balancer at the front end that ensures on client connections go to the same server. What they did was put a proxy application on the load balancer that allowed it to terminate communication streams and read or modify them i.e. insert cookies so that it would know which server in the group to send the connection to. This introduced a whole heap of issues because of the horrendous architecture design that could be sticky plaster fixed by the load balancer vendor and in the process charge the customer more money for the privilege :-).

I have nothing against cookies (especially chocolate chip) but they should be in either:

  • The Database
  • Memcached (or something similar)

NOT on the load balancer. If your cookies are on the load balancer, then they are totally useless to the application and therefore you can’t get a session to fail over to another server in the cluster.

Small caveat before I get flamed: Yes I know some applications are legacy and badly written (i.e. without a persistent data store), that doesn’t make cookies on load balancers any more elegant than a sticky plaster.

Layer 7 Reverse Proxy vs Layer 4 Load Balancing Router:

Layer 4 Load Balancer aka. Router:

  • Scalable
  • Reliable
  • High Performance
  • Load Balancing

Layer 7 Load Balancer aka. Reverse Proxy:

  • Not Scalable
  • Low performance (needs very powerful hardware)
  • Far more complex code base
  • Terminates Connections and therefore needs custom code for each protocol + security headache
  • Becomes a single point of failure (requires bandwidth for syncing session state with backup device)
  • Real Servers in cluster by definition can’t fail over and therefore:
    • NOT conducive to High-Availability
    • Load is distributed but definitely not balanced

Another caveat: OK, so I’m being a bit mean. Yes you can minimize virtually all of the downsides to layer 7 load balancing with modern devices, good planning etc. You can apply enough sticky plaster bits of code to get the whole thing working as expected, you could use layer 4 load balancers to balance the load over multiple layer 7 units to get over the performance issues and….

Hang on a minute, if the application handled persistence correctly then surely we wouldn’t need all this crap?

KISS (Keep It Simple Stupid)

Surely any web application developer in the whole wide world, now understands that any persistent data should be available to all nodes in the cluster?
One of the first things a web developer should decide is where is the application state data going to be held?

What the developer should do with session data:

  • Put it in a database (clustered or replicated of course)
  • Put it in a memory cache pool (clustered or replicated of course)

What they do in practice and then regret latter:

  • Put it in the standard local PHP or ASP/.Net container! Arrrggghhhh... NB. Their are now well established routes to making the standard containers persistent.

I will come back to this later…

And I’m back…

Times have moved on, and I must admit Layer 7 load balancers are pretty useful tools to have in your armoury.
You can do all sorts of application tuning, scaling, compressing, application fire-walling etc…

But every time we discuss this in the office we come back to the same argument, fix your application first!

Why would you use a load balancer to stop a SlowLoris or HTTP GET Flood? -Why not just use your application cluster?

Why do compression on the load balancer when you can do it in the cluster? – Why not just use your application cluster?

Why do rate limiting on the load balancer? -Why not just use your application cluster?

At the end of the day you are still going to need to figure out exactly which rules you wish to implement and you will need to be fully aware of the consequences of your actions.
If you rate limit or quality shape traffic doesn’t that make a DDOS attack even easier to achieve against you?

Maybe what I’m getting at is that nothing is ever easy, you won’t get any silver bullet solution…

So maybe my position has changed:

If you are buying and implementing a low end load balancer for your application cluster then stick to layer 4 load balancing and optimize your application cluster as required.
If your application requires simple layer 7 proxy implementation including cookies etc. Then fine go ahead – after all you have no choice…

BUT if you are starting to mess around with rate limiting/URL and cluster grouping rules/application firewalls etc.

Go and buy an F5 and talk the solution over in detail with their excellent support people before even considering implementing anything that could break your application in ways you never expected…

Do not believe that a Barracuda Networks or Kemp Technologies or even Loadbalancer.org appliance would be a sensible solution…

Updated for 2022...

Wow, 15 years have passed since I wrote this blog!

So do I still think that layer 7 load balancers suck?

Yes, instinctivley as an engineer I still do... BUT:

"HAProxy from Willy Tarreau is now such an awesome, flexible and indestructible piece of open source layer 7 reverse proxy magical genius."

And I have to admit — I now love it!

I may also be slightly biased by the fact that here at Loadbalancer.org we spend all day every day ensuring that our partners applications and customers get all the lovely sticky plaster plaster solutions they could ever want!

"Because when a single point of failure is not an option, you want zero downtime from the load balancer experts."