One of the (many) traditional problems with load balancing is the requirement to change your infrastructure in order to implement a hardware load balancer.

Traditional DNS based round robin was easy as you just added extra IP addresses to your A record, but when using a hardware load balancer you need to get it between your clients and your servers. Some of the original units such as the CISCO 416 local re-director could be used in 'bridge mode' where traffic was physically forced to pass through the load balancer hardware and the packets were changed on the fly. Although this was fairly transparent it introduced a single point of failure in the load balancer unit. Most recent load balancer hardware is configured in NAT mode (like a firewall) where traffic is translated from an external subnet to an internal one while carrying out the load balancing of packets.

The advantage of NAT mode is:

  • Works with all backend servers (real servers) by changing the default gateway to point at the load balancer
  • Fairly high performance as it works like a router (faster than your average firewall)
  • Enables traffic inspection, translation and reporting on both inbound and outbound
  • Is transparent to the real servers (i.e. server logs show correct client IP address.)

But the big disadvantage of NAT mode load balancing is that you need to move your backend servers into a different subnet.
This  can be a real pain in the neck...

  • NAT requires both an external (public) and internal (private) subnet
  • All the backend servers must use the load balancer as a default gateway
  • Any non-load balanced services DNS, SMTP etc. must all have specific firewall pin holes or routes created for them
  • Often all internal services can be masqueraded through the load balancers external IP
  • When you setup you often need to physically change your architecture (network cabling)
  • When something goes wrong you often need to physically change your architecture (network cabling)

NB. Layer 7 proxies (F5, Zeus, HaProxy etc.) in non-transparent mode don't have these issues (but they are very computationally expensive). In transparent mode they must be setup in the same manner as NAT with internal subnet and default gateway.

Direct Routing (Direct Server Return) is the only transparent load balancing technique that doesn't require the default gateway to point at the load balancer.
The advantages of Direct Routing are:

  • Full transparency: The servers see a connection directly from the client IP and reply to the client through the normal default gateway.
  • No infrastructure changes required: The load balancer can be on the same subnet as the backend servers.
  • Lightning fast: Only the destination MAC address of the packets is changed and multiple return gateways can be utilized for true multi-gigabit throughput

The disadvantages of Direct Routing are:

  • Backend server must respond to both its own IP (for health checks) and the virtual IP (for load balanced traffic)
  • Port translation or cookie insertion cannot be implemented.
  • The backend server must not reply to ARP requests for the VIP (otherwise it will steal all the traffic from the load balancer)
  • Prior to Windows Server 2008  some odd routing behavior could occur in <2% of Windows Server installation.
  • In some situations either the application or the operating system cannot be modified to utilse Direct Routing.

In my personal opinion, if you can use Direct Routing then you should use it.
Lori MacVittie has a few extra disadvantages listed here