Do-It-Yourself

...or...

How the Cluster Really Works

Intro

Whether you are setting up LAM/MPI, MPICH, or PVM, one thing that will make your life significantly easier is having a simple way to get to the code on multiple machines.

All protocols need access to the code. MPI and PVM have a number of helper aplications to run the code that requires the user be able to execute commands on the other cluster machines without giving a password. This is best done using SSH.

Let's assume the following, you are setting up a parallel implementation on a UNIX cluster, and need to have (1) easy access to the code from all machines and (2) ability to execute commands remotely (but securely) without entering a password. The protocols that will be used to do this are NFS (Network File Systems), NIS (Network Information Service, formerly YP), and SSH (Secure Shell). Because SSH needs strong authentication, your host names better agree with your IPs whenever names are resloved over DNS.

The goal is to have a user account which exists on all machines, has the same user directory regardless of which machine the account is logged into, and can log into one machine from another without having to reenter passwords.

With that in mind, if you wanted to create your own cluster, you would need to do the following on each machine:

Configure DNS. Each machine should have a host name and a valid IP. If your machines are getting their IPs over DHCP, make sure that all names can be resolved, including reverse lookups. One possibility, if your cluster is off the network, create a phony domain name, use a range of IPs that are reserved for private networks (such as the 192.168.0.* range often used with NAT subdomains), and store all name information in /etc/hosts on each machine.

Configure NFS. Each machine needs to be able to access the code. You could do this by copying information out to every node, but the quickest and dirtiest way is to have your users home directory be the same on every node, mounted from your priamry machine over NFS.

Configure NIS. Unless you want to keep track of user and group info separately on each machine, set up NIS so that user accounts created on the primary machine are recognized by each node.

Configure SSH. In addition to having SSH on each machine, make sure that your users have their authorized hosts and keys set to allow them to log on from one node to another without checking for a password.

Install MPI (or PVM, or another protocol). Configure MPI for your cluster.

NFS

NFS is going to handle the directory account part. NFS is a protocol by which UNIX machines can access, or mount, each others disks. Typically you will have one main server on which you store the user account and any codes. Anytime you allow machines on the net to access a disk on a machine, you run some level of security risk, and some steps will be needed to offset this risk. Your alternative is to place a copy of the code in the same location on every machine in the cluster.

NIS

But just because another machine can mount the main disk, that doesn't mean the other machines know who the user is.You can do a couple of things here. You can create an account on every machine by hand, giving every account on each machine the same user number. Or, you can have one server store the user account information (username, password, home directory location, user number, etc.), and have all of the other machines access that server to find out that information. NIS is a protocol by which UNIX machines share account and password information.

OK, so you have decided on one machine to be a server. You have either set that machine up as an NIS server and all of your client machines as NIS clients, or you have made sure that the same user account exists on every machine by adding the user by hand.

SSH

You still need to deal with the security issue of letting your parallel processes access the accounts on each machine without having to enter a password. SSH allows you to create public keys, and authenticate on that public key instead of using a password. Typically this is stored in your $HOME/.ssh directory.

Other Security Issues

Here is where the real security fun comes in. The single largest security hole in this is that a computer cracker could pretend to be a trusted machine to get access to the user directory, change the information in that file, and allow themselves access to the cluster. So, IF you use the method used by many clusters of exporting disks via NFS and allowing users to authenticate via an encrypted key stored on that exported disk, you should take some steps to make sure that people elsewhere on the net cannot also mount your exported disks. The simplest method is to simply not have your cluster on the net. If you do have your cluster on the net, only one machine should be accessible, and the cluster should be behind some sort of firewall.

If you are building a working cluster, please spend some serious attention to the issue of network security.

If you are building a small test cluster, and are concerned about security, consider creating accounts on each machine and copying your programs to each account before running them.

CSERD

Cluster Installation Methods

Do-It-Yourself

How the Cluster Really Works

Intro

NFS

NIS

SSH

Other Security Issues