Getting Started with Open MPI on Fedora
Recently rediscovered the world of parallel computing after wondering what to do with a bunch of mostly idle Linux boxes, all running various versions of Fedora Core Linux. I found this guide particularly useful and decided to elaborate on the subject here.
Background
Open MPI is an open-source implementation of the Message Passing Interface which allows programmers to write software that runs on several machines simultaneously. Furthermore it allows these copies of the program to communicate/cooperate with each other to say... share the load of an intensive calculation amongst each other or, daisy-chain the results from one 'node' to another.
This is not new, its been around for decades and today it is one of the main techniques used in Supercomputing platforms. The basic principle is you need two things:
- MPI development suite to build your MPI-capable applications (e.g. Open MPI)
- Client/server queue manager to distribute the programs to remote computers and return the results (e.g. TORQUE)
Both these components are distributed by the Fedora Project and are readily available.
Setting up the TORQUE Server
Prerequisites
Firstly, you will need to doctor the /etc/hosts
file, placing your preferred hostname infront of "localhost" on the "127.0.0.1" line, example:
127.0.0.1 mpimaster localhost.localdomain localhost
Package Installation
Now, you will need to install the following packages, using something like YUM. The package torque-client
will require some GUI related libraries (freetype, libX*, tcl, tk etc.) even if you're not using X on the torque-server:
$ sudo yum install torque torque-client torque-server torque-mom libtorque
Server Setup
Next you will need to do some setup stuff. If you get a warning that pbs_server is already running do a /etc/init.d/pbs_server stop
:
$ sudo /usr/sbin/pbs_server -t create
$ sudo /usr/share/doc/torque-2.1.10/torque.setup root
Configuration Files
Now, create the following file and put the hostname of this server.
/var/torque/mom_priv/config:
$pbsserver mpimaster
Create another file, this will contain a list of all the nodes/clients we're going to be using. The parameter "np=4" describes the number of processors (or cores) available on this node. In both cases below the client will be a QuadCore processor so I have set "np=4". If you need to add more nodes to your MPI cluster at a later time, this is where you configure them.
/var/torque/server_priv/nodes:
mpinode01 np=4
mpinode02 np=4
We create another config file, this time just containing the hostname of the server machine.
/var/torque/server_name:
mpiserver
Firewall Configuration
Now we update IPTables to allow incoming connections to the server. Here's an example of my own configuration with the additional two lines opening up tcp/udp ports 15000 to 15004. Once done run $ sudo /etc/init.d/iptables restart
to pickup the new settings.
/etc/sysconfig/iptables:
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15000:15004 -j ACCEPT
-A INPUT -p udp -m udp --dport 15000:15004 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
IMPORTANT: Commands are sent to the client nodes over RSH/SSH. In order to make this all work its assumed you've setup key-based SSH from the server to each of the client nodes.
All done, a quick restart of the torque server and we're onto setting up our client nodes:
$ sudo /etc/init.d/pbs_server restart
$ sudo /etc/init.d/pbs_mom restart
Setting up the TORQUE Client Nodes
Quick Setup Command
Going for speed/efficiency, I devised a one-line shell command to install and configure each of the clients if you are logged on as root:
# yum -y install torque-client torque-mom && echo -e "192.168.0.240\tmpimaster" >> /etc/hosts && echo "mpimaster" >> /var/torque/server_name && echo "\$pbsserver mpimaster" >> /var/torque/mom_priv/config && /etc/init.d/pbs_mom start
Step-by-Step Setup
But basically it breaks down into the following:
-
Install the client software:
$ sudo yum install openmpi torque-client torque-mom
-
Add the server's hostname and address to the
/etc/hosts
file:# echo -e "192.168.0.240\tmpimaster" >> /etc/hosts
-
Set the server's hostname in the config file(s):
# echo "mpimaster" >> /var/torque/server_name # echo "\$pbsserver mpimaster" >> /var/torque/mom_priv/config
-
Start the service:
/etc/init.d/pbs_mom start
Testing the Setup
From the 'mpimaster' machine, you should be able to issue the command pbsnodes -a
and see the client machines connected e.g.:
$ pbsnodes -a
mpinode01
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux pepe 2.6.27.12-170.2.5.fc10.i686.PAE #1 SMP Wed Jan 21 01:54:56 EST 2009 i686,sessions=? 0,nsessions=? 0,nusers=0,idletime=861421,
totmem=5359032kb,availmem=5277996kb,physmem=4146624kb,ncpus=4,loadave=0.00,netload=104310870,state=free,jobs=? 0,rectime=1235751237
mpinode02
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux taz 2.6.27.12-170.2.5.fc10.i686.PAE #1 SMP Wed Jan 21 01:54:56 EST 2008 i686,sessions=? 0,nsessions=? 0,nusers=0,idletime=366959,
totmem=5359048kb,availmem=5277268kb,physmem=4146640kb,ncpus=4,loadave=0.00,netload=46008061,state=free,jobs=? 0,rectime=1235751223
If you see this, congratulations! you are ready to rock! If your client nodes are not connected, check the configuration, network connectivity and lastly, check the 'pbs_mom' service is running on each client, optionally try restarting the 'pbs_mom' service.
MPI Development
You'll need to install a couple of additional packages on your development machine:
$ sudo yum install openmpi openmpi-devel openmpi-libs
Hello World Example
Now let's start with the inevitable 'Hello World!' example:
hello.c:
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Hello World! from process %d out of %d on %s\n", rank, numprocs, processor_name);
MPI_Finalize();
return 0;
}
Compiling the Application
Normally we'd just use gcc
to build this, but for convenience MPI provides a mpicc
which handles the include and library paths for you:
$ mpicc hello.c -o hello
Creating the Hostfile
In order to tell Open MPI / Torque where to run your application we must provide it with a "hostfile", similar to the file /var/torque/server_priv/nodes
we made earlier:
./myhostfile:
mpinode01 slots=4
mpinode02 slots=4
Running the Application
Now, we're ready to run it for the first time. Note, in this example I did my development work on the machine acting as the 'mpiserver' - if you try submitting an MPI job from another machine you might need slightly different configuration.
$ mpirun --hostfile myhostfile hello
Output:
Hello World! from process 0 out of 8 on mpinode01
Hello World! from process 1 out of 8 on mpinode01
Hello World! from process 2 out of 8 on mpinode01
Hello World! from process 3 out of 8 on mpinode01
Hello World! from process 4 out of 8 on mpinode02
Hello World! from process 5 out of 8 on mpinode02
Hello World! from process 6 out of 8 on mpinode02
Hello World! from process 7 out of 8 on mpinode02
Understanding the Results
Voilà, you have just submitted an MPI task and had it execute on a number of your processors. MPI makes distributing & communication between copies of your programs easy, however it's up to you to use this potential to provide a real speed up in a real-world application.
Practical Example
A really simple example is a program that operates on a set of 8 large files. Normally, while running on a single processor you would process these files sequentially. Using MPI you could load 8 copies of your program on 8 processing nodes, and have each node process a different file. Effectively giving you a 8-times speed up compared to running it on a single processor.
Compatibility
I've loosely tested the approach described here on different systems running Fedora Core Linux versions 8, 9 & 10. Any questions / comments welcomed!
Troubleshooting
Firstly, try out the Open MPI FAQ's, personally I encountered the following problems:
- mpirun appears to 'hang': caused by
iptables
, I just shut downiptables
to resolve the issue. - Fedora Core 7: the package sets the wrong library path in
/etc/ld.so.conf
- Fedora Core 7: the package included with the distribution 'doesn't work', library issues
Update Note
Updated: 2nd March 2009 Ooppss! as Jeff Squyres pointed out in his comment below, the way I configured things in the original post meant that "mpirun" just spawned 8 processes on my localhost - not the remote nodes. I've reworked the configuration to account for this. Many thanks Jeff!
Related Posts
Open MPI Master & Servant Example - BogoMips
Extend MPI master-servant pattern to pass numerical data between nodes using BogoMips CPU calibration across distributed cluster processors
An Open MPI Master & Servant Example
Learn to create master-servant MPI programs with process communication using practical C examples for distributed computing clusters
A brief history of Red Hat, Fedora and CentOS
Confused by Red Hat's Linux ecosystem? Learn how RHEL, Fedora, and CentOS evolved from one distribution into three distinct solutions for different needs.