Cloning workstations with Linux LG #89

...making Linux just a little more fun!

Cloning workstations with Linux
By Alan Ward

Anybody who has had to install a park of 10 - 100 workstations with exactly the same operating system and programs will have wondered if there is a neater - and faster - way of doing it than moving the CDs around from box to box. Cloning consists in installing - once - a model workstation setup, and then copying it to all the others.

The purpose of this text is to explore several of the many ways of cloning a workstation hard disk configuration. In the cloning process, we will use the native possibilities of Linux to produce more or less the same effect as the well-known Norton Ghost of the Windows world.

Though we will be booting the workstations under Linux, the final operating system they will be running may or may not be Linux. Actually, I use this system for a park of Windows ME workstations that get to be reformatted at least once a year - for evident reasons.

Hard disk switching

The oldest way of cloning a hard disk requires two workstations (A is the model, B is the clone), and another computer C. Only C needs to run Linux.

1. We take the hard drives out of the two workstations, and add them to C. Take care to leave C's original hard disk in the first IDE position. For example:

	IDE bus 0, master	=>	C's hard disk	=>	/dev/hda
	IDE bus	0, slave	=>	A's hard disk	=>	/dev/hdb
	IDE bus 1, master	=>	B's hard disk	=>	/dev/hdc

We then get to copy the contents of /dev/hdb to /dev/hdc. If they are both the exact same model, we can get by with a plain byte-by-byte copy:

	dd if=/dev/hdb of=/dev/hdc

or even:

	cp /dev/hdb /dev/hdc

These are the easiest ways of doing the copy, however you should be aware of the following points:

The hard disks must be the exact same model: you have problems with more recent/older versions of the same hard disk.
You may have problems with bad sectors in either A or B.
You are also copying all the blank parts of the A disk to B; this can take some time and is useless for our purposes.

This way can be the best for people using bootloaders such as lilo or grub, as the bootsector is copied along with the rest.

The second, slightly more complicated way of copying A to B consists in two steps:

First, you get to set up B's partition table (with fdisk, cfdisk, ...)
You then format B's partitions (with mkfs.ext2, mkfs.vfat, mkswap)
You do the actual copying.

In this case, copying means mounting:

	mkdir /mount/A ; mkdir /mount/B
	mount /dev/hdb /mount/A
	mount /dev/hdc /mount/B
	cp -dpR /mount/A/* /mount/B
	umount /dev/hdb ; umount /dev/hdc

This can be a bit of a pain if there are a lot of workstations to clone, but takes less time than a complete install ... and you are sure they have the same configuration.

Important: if you are using a bootloader such as lilo or grub to boot a Linux workstation, you then get to write a personalized bootloader configuration file and install it on disk B's boot sector.

Basically, you need to tell the bootloader:

To use disk /dev/hdc to write the boot sector on; this is where your cloned hard disk currently is.
To use disk /dev/hda to boot from; this is where your cloned hard disk will be when you boot from it.

Be careful: you may end up having to use your rescue disks if you do this wrong! Been there, done that. You've been warned. Before starting, take a close look at your current /etc/lilo.conf or /boot/grub/menu.lst files, and at their man pages

Alternatively, if you are just booting Linux you can:

copy the files to disk B
put disk B back into workstation B
boot workstation B from the rescue diskette you built for workstation A when installing the system
run lilo or grub directly

This second way can be much easier for people with less flying time on Linux systems. :-)

Another version of the same setup is, if C's disk is large enough, to copy once from A to C, and then to copy many times from C to B1, B2, B3 ... etc. If your IDE setup has enough busses (or you are using SCSI) you can copy 5 disks or more at a time.

Neadless to say, we use this only if we have no networking set up - a rather uncommon situation these days. However, speed can be rather high as we are working directly at IDE-interface speeds.

Copying over a network

Copying over a network consists in booting workstation B with a diskette or CD into an operating system that can drive the network (let's see now ... here Linux is in, Windows is out) and getting the hard disk image either directly from station A, or more commonly from a file server C. In our examples, I will use workstation B as the computer to be configured and suppose we have the image files from workstation A copied to a directory on server C.

There are several "tiny" Linux-on-a-diskette distributions available out there. MicroLinux (muLinux) is my favourite, but they all work in similar ways.

The idea is to boot from the diskette, and set up networking.

You can then either:

Have a complete hard disk image on the server, which you then copy onto the local disk with a byte-by-byte copy. As with a direct hard disk to hard disk copy, it is easier to set up, but also has the same caveats.
Have the filesystem available on the server, which means you get to partition the local disk, format the partitions and recursively copy the files from the network onto you disk.

An example of the first way, over NFS:

	mkdir /mount/C
	mount server:/exported.directory /mount/C
	dd if=/mount/C/my.image of=/dev/hda
	umount server:/exported.directory

An example of the second (supposing you already have set up and formatted the partitions on local hard disk /dev/hda):

	mkdir /mount/B ; mkdir /mount/C
	mount /dev/hda /mount/B
	mount server:/exported.directory /mount/C
	cp -dpR /mount/C/* /mount/B
	umount server:/exported.directory /mount/C
	umount /dev/hda

In the second case, if you use a bootloader, remember to install it either immediately after copying the files, or after rebooting workstation B from a rescue diskette.

The nice thing about Linux is that in essence, copying an image or separate files from a network is exactly the same as from another hard disk on your computer.

NFS is naturally not the only way of downloading the file or files from server C. There are actually as many suitable protocols as you have available clients on your bootable diskette. I would suggest you use whatever server you already have installed on your network. Some choices:

NFS (Network File System)	This is the native way Un*x systems use to share files; robust and easy to set up. My favourite.
HTTP (as in Web server)	Easy to set up on the server side, but it can be difficult to find a suitable client. Used mainly with automated install scripts. You may already have one of these running.
FTP	Less easy on the server side, but very easy to find clients. You may already have one of these running.
TFTP (trivial FTP)	Very easy to set up on the server, very easy to use the client. Many routers (eg. Cisco) use tftp to store their configuration files.
SMB (or Netbios)	Yes, this works. Your server can run either Linux + Samba or any version of WinXX. The client Linux system on workstation B can mount volumes using smbmount. Why you would ever want to is your business, though.
rcp or scp	(scp is preferable for security)
rsync	Another favourite of mine. Used normally to synchronize a back-up file or web server to the main server. This can be a bit of a security hole if server C is accessible from outside your network, so take care to block this on your firewall. Performs compression.

There is a recent on-a-CD distribution called Knoppix that boots you directly into a KDE desktop. From here, you can use all your regular graphics-based file tools if you are so inclined.

Booting from the network

A final twist is to boot workstation B directly from the network without using a boot diskette. The idea is to tell the BIOS to load a minimum network driver from an EPROM. Control is then passed to this driver, which goes onto the network searching for a DHCP server it can get an IP address and a kernel image from. It then boots the kernel, which in turn gets the root filesystem from an NFS server.

By this time, workstation B is up and running with a Linux system. You can then format its local hard disk and copy files from the server.

Needless to say, this rather more complicated to set up than a diskette or CD Linux system. However, the process can be completely automated and suits large networks with many workstations that must be reconfigured often.

Another twist of the same is to dispense completely with the local hard disks on workstations B1, B2, B3 ... and have them boot each time from the network. Users' files are stored on the central NFS file server.

Further reading

Another program used by many scientific cluster administrators is dolly. I've heard a lot of good about it, but have not tried it out yet.

On booting from a network, look up etherboot or, if your hardware supports it, PXE.

PS. Should anybody want to translate this article: I wrote it in the spirit of the GPL software licence. i.e. you are free (and indeed encouraged) to copy, post and translate it -- but please, PLEASE, send me notice by email! I like to keep track of translations -- it's good for the curriculum :-)

[BIO] Alan teaches CS in Andorra at high-school and university levels. His hobbies include science photography (both digital and traditional), trekking, rock and processor collecting.