Installing a Linux Single System Image Cluster ============================================== 0: Introduction --------------- The minimum hardware requirements for an interesting cluster are two or more computers, connected with an ethernet network. In this configuration, the Cluster File System (CFS) is used to share the first node's root disk with the other nodes in the cluster. CFS is essentially a highly coherent NFS that stacks on an ordinary file system such as ext2 or ext3. These instructions describe how to set up an SSI cluster with a CFS shared root. If you have shared disk hardware, such as FCAL or shared SCSI, you can use the INSTALL.gfs instructions to set up a Global File System (GFS) root. GFS is a parallel physical file system that coordinates multiple machines directly accessing the same disk. GFS requires a machine outside the cluster to act as a lock server. If you only have a single machine, you can play with an SSI cluster of virtual User-Mode Linux (UML) machines. For more information, look at http://ssic-linux.sourceforge.net/ssiuml-howto/index.html I: Getting started ------------------ 1) Install your distribution of choice on the first node. Most of the development and testing has been done with RedHat 7.2 and 7.3 (server configuration). Aneesh Kumar maintains support for Debian. Over a year ago, support was added for Caldera, but has not been well maintained. Other distributions are not currently supported at all. SSI clustering probably works on distributions other than RedHat 7.2. If you get it to work on another distribution, please send an e-mail to the SSI mailing list (ssic-linux-devel@lists.sf.net) describing what you had to do different from the instructions below. There's no need to install a distribution on any node other than the first one. 2) Do _not_ configure a separate /boot partition. 3) Configure LILO as the boot loader, rather than GRUB. The Open SSI project will eventually use GRUB, but not now. 4) Be sure to install the following packages. They are not installed by default under RedHat 7.2: - nasm - dhcp - tftp-server II: Configuring and installing the cluster kernel ------------------------------------------------- 1) If you're not familiar with how to configure and install a Linux kernel in general, read this: http://linuxdoc.org/HOWTO/Kernel-HOWTO.html 2) Download and extract vanilla 2.4.18 Linux source: bzunip2 linux-2.4.18.tar.bz2 tar xvf linux-2.4.18.tar 3) Apply the SSI patch (also includes various 3rd party software): cd linux patch -p1 < ssi-linux-2.4.18-v0.7.5.patch 4) Configure kernel options: cp config.i586 .config make xconfig - or - make menuconfig Make sure the following are enabled: Block devices-> Loopback device support RAM disk support Default RAM disk size = 9000 kB Initial RAM disk (initrd) support File systems-> /dev file system support Kernel hacking-> Built-in Kernel Debugger support Compile the kernel with frame pointers Clustering-> SSI Clustering Cluster Filesystem (CFS) Disable the following: File systems-> /dev/pts file system for Unix98 PTYs Compile your disk and network drivers as modules. If you're building a cluster that includes different x86 processor families, compile for the oldest processor you have. For example, if you have several Pentium IIs and a classic Pentium, select classic Pentium under 'Processor type and features->Processor family'. 5) If you'd like to include more than 15 nodes in your cluster, edit NSC_MAX_NODE_VALUE in include/cluster/config.h. This can be set as high as 125, although 32 nodes is the most that has been tested so far. 6) Build the kernel and modules: make bzImage modules 7) Install the kernel and modules: cp arch/i386/boot/bzImage /boot/vmlinuz-2.4.18-ssi make modules_install III: Configuring the cluster ---------------------------- 1) Make and install the SSI cluster tools for your distribution: cd cluster-tools-0.7.5 make install_ssi_redhat - or - make install_ssi_debian - or - make install_ssi_caldera 2) During cluster tools installation, answer the questions for configuring the first node. 3) Edit /etc/fstab and replace LABEL=/ with the real name of the root device (e.g., /dev/sda1). Make sure the line for the root device is the first in /etc/fstab. 4) Enable the TFTP server by editing /etc/xinetd.d/tftp, and setting 'disable' equal to 'no'. 5) Disable kudzu by removing /etc/rc?.d/S05kudzu. It doesn't work well in a shared root environment. 6) Run cluster_mkinitrd to set up your ramdisk. The arguments are the same as mkinitrd. If you built your network driver as a module, you must specify it on the command-line (e.g., eepro100): cluster_mkinitrd --with=eepro100 --cfs \ /boot/initrd-2.4.18-ssi.img 2.4.18 Special note for users of RTL8139 and PCNET32 cards: be sure to include the mii module _before_ your networking module when building your ramdisk. For example: cluster_mkinitrd --with=mii --with=8139too --cfs \ /boot/initrd-2.4.18-ssi.img 2.4.18 7) Edit /etc/lilo.conf to add a stanza for your new kernel and ramdisk. Make sure you include the following lines in the stanza: label=ssi initrd=/boot/initrd-2.4.18-ssi.img root=/dev/ram0 append="init=/linuxrc" read-write Make this stanza the default kernel by adding a 'default=ssi' line to the global options. This will be important when you later add dependent nodes using Etherboot. 8) Run lilo: /sbin/lilo -c 9) Edit /etc/fstab to mount ssidevfs on /devfs (it can't be on /dev, yet). Add the following line: none /devfs devfs defaults 0 0 Delete or comment out the line for /dev/pts. 10) Setup node-specific networking. Because of the shared root filesystem, the network configuration files are shared amongst all nodes in the cluster. This becomes a problem when more than one node tries configuring its interface using the same network parameters. Our solution to this problem is to create node-specific copies of the network config directories and mount these over the base directory using the mount command's "--bind" option. a) Create the node-specific network config directory for Node 1: cp -a /etc/sysconfig/network-scripts /etc/sysconfig/network-scripts-1 b) Add the following 3 lines to /etc/rc.d/rc.sysinit to mount the node-specific config directories. These lines should be added after the root and all other filesystems have been mounted: ... ... # Mount all other filesystems (except for NFS and /proc, which is # already mounted). Contrary to standard usage, filesystems are NOT # unmounted in single user mode. action $"Mounting local filesystems: " mount -a -t nonfs,smbfs,ncpfs -O no_netdev -> # Mount node-specific config directories. -> mount --bind /etc/sysconfig/network-scripts-`clusternode_num` -> /etc/sysconfig/network-scripts # check remaining quotas other than root if [ X"$_RUN_QUOTACHECK" = X1 -a -x /sbin/quotacheck ]; then ... ... c) Make any node-specific networking changes in the networks-scripts-`clusternode_num` directory. (Since networking was setup for Node 1 already, you shouldn't need to make any changes now). 11) Reboot with your SSI kernel. You now have a single node cluster. IV: Adding new nodes -------------------- 1) Select an interface on the new node for the cluster interconnect. It must have a chipset supported by Etherboot, PXE, or a similar DHCP/TFTP netboot technology, and it must be discovered by netboot before any other card with the same chipset. It's best to use the lowest numbered card of a supported chipset (e.g., eth0 rather than eth1, if they are both eepro100 cards). 2) If the selected interface does not already have an installed netboot image, download an Etherboot image from the following URL: http://rom-o-matic.net/5.0.5/ Choose the appropriate chipset. Under 'Configure' it is recommended that ASK_BOOT be set to 0. 'Floppy Bootable ROM Image' is the easiest format to use. Just follow the instructions for writing it to a floppy. 3) Connect the selected interface to the same physical network as the first node, stick the newly created boot floppy in its drive (if needed), and boot the computer. It should display the MAC address of the interface it is attempting to netboot with, then hang while it waits for a DHCP to answer its request. 4) Create the node-specific network config directory for the new node on the CFS root, and edit the new node's network settings in there (i.e. for new node 3): cp -a /etc/sysconfig/network-scripts /etc/sysconfig/network-scripts-3 5) On the first node (or any node already in the cluster), run the addnode command. It'll list the unknown MAC addresses that have recently probed the cluster. Choose the one that is displayed on the console of the new node. If it doesn't appear in the list, try rescanning. 6) Select a node number and IP address for the new node. Confirm the configuration and addnode will do all the work to admit the node into the cluster. 7) Wait for the new node to join the cluster. You can confirm its membership with the cluster command: cluster -v If the new node is still hung searching for the DHCP server, try manually restarting the dhcpd daemon on the CLMS master node: killall dhcpd (It'll be automatically respawned by init.) 8) Configure the new node with swap device(s) using fdisk and mkswap (or similar tools). The device name(s) must be the same as the swap device(s) on the first node. Either reboot the node or manually activate the swap device(s) with the swapon command: swapon 9) If you wish to configure the new node with a local boot device, use fdisk and mkfs (or similar tools) to set up the partition. Format it with an ordinary Linux filesystem, such as ext2. Run the chnode command on any node, select the new node, enter its boot device name, and chnode will finish setting it up. 10) After configuring a local boot device, you may also use the chnode command to make the node a potential CLMS master node. This means it can take over centralized cluster services if the active CLMS master node fails. After changing whether or not a node is a CLMS master, the entire cluster must be rebooted for the change to take effect. 11) Repeat steps 1-10 for any other nodes you wish to add. 12) Enjoy your new SSI cluster!!! One of the first things you can try is running the demo Bruce, Scott and I did at Linux World and HP World 2002. It illustrates some of the features of SSI clustering. You can find it here, along with older demos from 2001: http://ssic-linux.sf.net/index.shtml#demos If you have questions or comments that aren't answered on the website, try posting to the mailing list: ssic-linux-devel@lists.sf.net -- Copyright (C) 2001-2 Hewlett-Packard Company. All rights reserved.