As a new member of the Loadbalancer team, I have been given the unenviable task of explaining how to recover your cluster should your master fail, (Note: fail as in hardware failure not just unplugging the network cable!)
NB. This process is designed for v6.6 through v6.18 Loadbalancer.org appliances
Hopefully you have handy the backup configuration file – lb_config.xml
If you don’t have a copy already and your looking at this article because your organised and planning for disaster, NOW might be a good time to go and get one.
Log into your master Loadbalancer and select “Maintenance” and click on “Disaster Recovery” then “Download XML Configuration file” and keep the file that is downloaded somewhere safe.
If your reading this and your cluster has failed and you don’t have a backup there are a few ways of recovering the file.
The simplest way is take a copy of the lb_config.xml from the slave machine and edit the following lines: (NOTE: if your setup is more complicated and uses firewall marks or SSL or your not sure what other changes to make its probably best to contact email@example.com
lbslave to lbmaster
To 192.168.2.81 (replace IP address with the address of your slave machine) 192.168.2.81 to 192.168.2.80 so change the IP address from the address of your slave machine to the address of the master.
That should be it, click save and we can carry on with recovering the master.
- Shut-down the master if possible, if its already off you can skip this step.
- Disconnect the Heartbeat (Serial) cable and the network cable.
- Repair what ever problems you are having with the master.
- Connect a mouse, monitor and keyboard do the server.
- Restore the master from the Loadbalancer image visit -
https://www.loadbalancer.org/blog/how-to-recover-your-load-balancers-to-v65-via-usb-stick for instructions on how to restore from an image, if you need help contact firstname.lastname@example.org
- During the aforementioned restore at no point reconnect the cables!
- Log onto the machine with:
User name: root
and at the terminal stop the heartbeat service by issuing the command : “service heartbeat stop”
- Load your lb_config.xml and lbrecoverv66 (which can be found at http://downloads.loadbalancer.org/support/recoveryscripts/lbrecoverv66.txt right click on lbrecoverv66 and click save link as) onto a USB stick
- Insert the USB stick into the server
- Enter the command: “fdisk -l”
This should give you a list of Drives attached to your machine from that you should be able to work out which one it is, if you only have one drive in your server then its probably /dev/sdb1
- To mount the device enter the below command:
“mount /dev/sdb1 /mnt”then you can “cd /mnt” and a simple “ls” should show you your files.
- Enter the command:
“php lbrecoverv66”at the terminal to load the recovery script and wait for 2 mins then shutdown the machine by entering:
“shutdown -h 0”
- Reconnect the Heartbeat (Serial) cable and the network cable
- Turn the machine back on
- Restart the Heartbeat on the master by logging into the web interface on your recovered master and click on “Maintenance” then “Restart Heartbeat”.
Wait a few minutes and your cluster should be restored!
Recovering from Slave failure
If, instead, your Slave loadbalancer has failed, here's how to recover it...Again, the easiest way is if you have a backup of the lb_config.xml from the Slave. If you haven't backed that up as well, you might like to do it now.
If the Slave has failed and you do not have a backup of its config, below is a procedure for recovering from the Master.
- Download the Master's config, using Maintenance > Disaster Recovery > Download XML configuration file.
- In a text editor, make the following changes to the
In the section,
Remove any IP address in the
<physical> <rip>section, change the IP addresses for eth0 and eth1 as necessary.
- Save your modifications to
- If the Slave loadbalancer is still connected to the network and the serial link, disconnect it.
- Fix the slave server.
- With the slave still disconnected from the network and serial link, restore the Loadbalancer image, using the instructions at https://www.loadbalancer.org/blog/how-to-recover-your-load-balancers-to-v65-via-usb-stick
- When the install is complete, log in to the slave with username
- At the root prompt, run the following code to stop the heartbeat service.
service heartbeat stop
- Transfer the master's config and recovery script, from http://downloads.loadbalancer.org/support/recoveryscripts/lbrecoverv66.txt, to the new slave on a USB key.
- Plug the USB key into the slave. Run the command
fdisk -lto discover the device that the system has allocated to the USB key. It will usually be
- Mount the USB key using
mount <device> /mnt, and change to the directory with the config and recovery script.
- Start the recovery by running
php lbrecoverv66. When the prompt returns, wait a couple of minutes then shut down the slave server with
shutdown -h 0
- Reconnect the slave to the network and the serial link, and restart the server.
- When the slave has finished booting, log on the master's web interface and restart the heartbeat service using Maintenance > Restart Heartbeat
Your cluster will now be restored.