Loadbalancer.org has always given high-availability the utmost priority in its product design. However prior to v7.6, cluster recovery (i.e. re-synchronized master & slave appliances after a cluster failure) without downtime was a slightly convoluted process that while possible - was not simple. Loadbalancer.org support staff often recommended a full heartbeat restart on both nodes as the simplest solution even though it involved a small amount of downtime. We've made a big effort to ensure that this process in V7.6 is as simple as possible in the rare event of a cluster hardware or software failure.

The Scenario:

A v7.6 clustered pair has been previously configured and the master unit has now failed. For a correctly configured pair the slave unit will automatically take over to ensure that load balanced services continue to be available.
From a high-availability perspective ideally we want to leave the slave live as the primary handling all traffic and bring a new master back online with no downtime, this is now much simpler in the Loadbalancer.org appliance.
Barracuda still have a few issues with their load balancer when it comes to cluster recovery

--snip

  1. After verifying the configuration, complete the following at the same time:
    • Shutdown the Secondary device, and
    • Connect and power up the new Primary device to the production network.

--snip
Kemp Technologies cluster recovery seems nice and simple... but actually configuring the cluster in the first place is a bit convoluted.

Action Required:

To restore HA the master unit must first be repaired/replaced then reintroduced into the cluster. Prior to v7.6, heartbeat was generally restarted on both units or both units would be rebooted to simplify this process. From v7.6 this is now easier - the steps below illustrate the revised recovery process.

Outline Steps:

  1. Repair/replace the master appliance
  2. Restore the configuration from backup
  3. Reload heartbeat on the master appliance

** N.B. Selecting Reload rather than Restart is critical to this process **

Detailed steps:

  1. Repair/replace the faulty master
  2. Power up the repaired/new master appliance and using the network setup wizard configure the IP address, gateway and DNS server(s)
  3. Using the WUI option: Maintenance > Backup & Restore click Upload XML file & Restore then select the backup XML file for the master
  4. Click Upload and confirm the check message to continue
  5. Once the XML restore is complete heartbeat is stopped to prevent peer interference as explained in the yellow onscreen warning message
  6. Using the WUI option: Maintenance > Restart Services click Reload Heartbeat

*** N.B. Selecting Reload rather than Restart is critical to this process ***

  1. Once heartbeat has reloaded, the master and slave will be automatically re-synchronized, the master will be passive and the slave will remain active, this occurs with no disruption to load balanced services
  2. Verify that:
    The master displays:
    status1 and the slave displays:status2
  3. To force the master to go active and the slave passive, during a maintenance window click the [Advanced] option in the green Information box and click the Take Over button
    exch2013
  4. Verify that:
    the master now displays:
    status3
    and the slave now displays:status4

Conclusion:

From v7.6 the cluster can be restored to full working order in a much simpler way. It's also now very easy to return the master to be active and the slave passive using the new Take Over button.

These improvements are on top of the previous ones detailed in this post describing changes to default settings to enhance load balancer availability.