This introduction to SHMOD describes a framework and programming style
appropriate for a wide range of parallel computational tasks. If
an algorithm can be expressed as a list of independant tasks, this
toolkit can help implement the solution in a fault-tolerant, dynamic and
efficient way. The required network, I/O and data-base functions are
supplied in a user library. A very simple example ("hello_shmod") is
provided to introduce the user library. A fully functional example is
also provided illustrating one way to acheive fault-tolerance,
load-balancing and dynamic performance.
Requirements
C Source is provided under the GNU Public license. A generic
Linux and Windows makefile are supplied, though they may need
modification for a specific installation. The mySQL database
library is required to build SHMOD; see
mySQLfor
the latest release.
Overview
There are three major classes of functions in SHMOD.
0) Initialization and shutdown.
a) start_thing( sqlhostname, sqluser, sqlpw,
sqldatabasename )
b) stop_thing()
1) Client-Server (really peer-peer) bulk high-bandwidth non-blocking
remote I/O
a) create_thing( name, hostname, hostdirectory,
size )
b) read_thing( name, localaddress, nbytes, remoteoffset,
&iohandle )
c) write_thing( name, localaddress, nbytes, remoteoffset,
&iohandle)
d) wait_thing( iohandle, waittime )
2) Global database table access to locks, and get/put of small amounts
of global data.
a) getglob_thing( name, localaddress, nbytes,
×tamp )
b) putglob_thing( name, localaddress, nbytes )
c) lock_thing( lockname )
d) unlock_thing( lockname )
The SHMOD API is layered on a supporting database API and remote I/O
facility. The SHMOD remote I/O server is supplied for TCP/IP sockets
(ipriod) - an older, Myrinet/GM version (sriod) is also supplied
but not required. In the near future, a third alternative for network
communication will be supported which will be appropriate for
high-bandwidth transfers over the TeraGrid.
In the figure below, only one instance of a user's application and a
remote I/O server are shown. In use on a parallel system, there
would be likely many of both.
Hello World
This simple complete example is in Fortran. Both Fortran and C language
'bindings' are provided in SHMOD. See
downloads for a link to the following
annotated source in a tarball. This program does not overlap SHMOD
reads or writes of data contexts, nor does it recover from workers
acquiring work assignments and never completing them. It does, however,
illustrate a structure for cooperative update of N tasks by M workers,
for any M<=N. In the interests of simplicity, error handling is
rudimentary.
!----------------------------------------------------------------
! .... control data ....
integer work
common /cwork/ work(N)
! .... context data .....
common /cdata/ data(S)
! ... repetitious declarations
include "shmod.f"
character(16) nameof
external nameof
!----------------------------------------------------------------
Program hello_shmod
! compilation for Intel Linux: efc -FR -w95 -fpp1 helloshmod.f
! compilation for Intel windows: efl /FR /w95 /fpp1 helloshmod.f
! ... Number of pieces of work
#define N 64
! ... maximum number of workers expected
#define M 64
! ... Size in words of the context data for a piece of work
#define S 1000
! ... Single remote I/O server which serves contexts to all workers.
#define SERVERHOST "frodo.lcse.umn.edu"
#define SERVERROOT "e:/temp"
include 'common.f'
! ----------------------------------------------------------------
! ... Connect to a mySQL server as user 'lcse', pw 'demo' and use the
! database 'demodb'.
ierr= start_thing( "frodo.lcse.umn.edu", "lcse", "demo", "demodb" )
if ( ierr.ne.0 ) call handleit("Cannot start")
! ... work loop
do while (.true.)
! ... Attempt to read the control block. If it is
zero length,
! instead initialize it.
if ( lock_thing("alock") .eq. 0 ) call
handleit("Cannot lock")
! for dbg, always init
! ierr= getglob_thing( "work", work, sizeof(work),
now )
ierr= 0
if ( ierr .ne. sizeof(work) ) then
call initialize
print *, "Initializing"
ierr=
putglob_thing("work",work,sizeof(work) )
if ( ierr .ne. sizeof(work) ) then
call
handleit("Cannot initialize work glob")
endif
end if
! ...OK, what should I do next based on the 'work' assignment array?
myjob= nextwork()
if ( myjob .lt. 0 ) then
print *, "No work left, exiting"
call stop_thing
stop
endif
! ... mark as assigned
mystate= work(myjob)
work(myjob) = -work(myjob)
! ... Store the updated job info and unlock the table.
if ( putglob_thing( "work",work,sizeof(work) ) .ne.
sizeof(work) ) then
call handleit("Cannot putglob")
endif
if ( unlock_thing("alock") .eq. 0 ) call
handleit("Cannot unlock")
print *, "doing work piece ", myjob, " state ",
mystate
if ( mystate .eq. 1 ) then
call initializework(myjob)
else
! ... Retrieve the data context required for
the task
if ( fetch(myjob) .ne. 0 ) call
handleit("Cannot fetch")
endif
! ... Perform the computation on the context
call dowork(myjob)
! ... Write it back to whence it came
if ( writeback(myjob) .ne.0 ) call handleit("Cannot
writeback")
end do
call stop_thing
stop
end
! -----------------------------------------------------------------
! ... find next piece of work to do (if any). If work(i) is
negative,
! it is currently assigned. Return -1 if
there is no work to
! assign.
integer function nextwork()
include 'common.f'
do i= 1, N
if ( work(i) .gt. 0 ) then
nextwork= i
return
endif
enddo
nextwork= -1
return
end
! -----------------------------------------------------------------
! ... return character string which is the SHMOD 'name' of the single
! ... data context needed for this piece of work.
character(16) function nameof(mywork)
write(nameof,"('context',I4.4)" ) mywork
return
end
! -----------------------------------------------------------------
subroutine initialize
include 'common.f'
! ... Set the work status array indicating all tasks are in state '1'.
! ... There are N tasks, each independent.
do i= 1, N
work(i)= 1
enddo
return
end
! -----------------------------------------------------------------
subroutine initializework(myjob)
include 'common.f'
do i = 1, S
data(i)= 0
enddo
ierr= create_thing( nameof(myjob), SERVERHOST, SERVERROOT, 0 )
if ( ierr .ne. 0 ) then
call handleit("Cannot create")
endif
return
end
! -----------------------------------------------------------------
subroutine dowork(myjob)
include 'common.f'
! ... computational work, such as it is
do i = 1, S
data(i) = data(i) + 1
enddo
return
end
! -----------------------------------------------------------------
subroutine fetch(myjob)
include 'common.f'
if ( read_thing( nameof(myjob),data,sizeof(data),0,iostat ) .ne. 0 )
then
call handleit("Cannot read")
endif
! ... wait for the data read.
ierr= wait_thing( iostat, -1 )
if ( ierr .ne. sizeof(data) ) then
print *, ierr
call handleit("cannot wait? Read failure")
endif
return
end
! -----------------------------------------------------------------
subroutine writeback(myjob)
include 'common.f'
if ( write_thing( nameof(myjob),data,sizeof(data),0,iostat ) .ne. 0 )
then
call handleit("Cannot write")
endif
! ... wait for the data writeback.
if ( wait_thing( iostat, -1 ) .ne. sizeof(data) ) call handleit("Cannot
wait")
! ... now we're sure the job is done, update the database.
if ( lock_thing("key") .eq. 0 ) call handleit("Cannot lock")
if ( getglob_thing( "work", work, sizeof(work), now ) .ne. sizeof(work)
) then
call handleit("Cannot getglob")
endif
work(myjob) = work(myjob)+1
if ( putglob_thing( "work", work, sizeof(work) ) .ne. sizeof(work) )
then
call handleit("Cannot putglob")
endif
if ( unlock_thing("key").ne.0 ) call handleit("Cannot unlock")
return
end
! -----------------------------------------------------------------
! ... All errors are fatal and uninformative, again in the interest
! ... of brevity. A real application need only say 'why' it is
exiting,
! ... no other SHMOD participants will be affected should one stop.
subroutine handleit(msg)
character(*) msg
print *, "A very bad thing has happened. ",msg
stop
end
Full Example
For the real details on how to accomplish fault tolerance and overlap
context I/O with compuation, see the code:
shppm.tar.gz
This section will sketch the structure that must be implemented to
realize the full benefit of SHMOD.
For fault tolerance, the application must become a bit more clever
about what work assignment it can take for itself. When the list
of work is acquired, it will include information about previously
assigned tasks and workers, such as how long the task required last time
it was performed, and to which worker it is currently assigned. The
"lookforwork" routine can then decide to reclaim a previously assigned
task for itself.
Overlapping context I/O is also fairly easy. The application
should 'take' three work assignments; one to pre-fetch, one to update,
and one which can be in the write-back phase. Once the 'pipeline' gets
running, being able to read & write a data context in the time it
takes to computationally update a third will allow complete overlap of
requisite communication and calculation.
Here is an overview of the structure:
Current Downloads
shmod3.tar.gz Source for SHMOD
user programming API
hello_shmod.tar.gz Source for
simple SHMOD example explained in this text
shppm.tar.gz Source
for complete sPPM application and driver
SHMOD2 PDF
document describing the updated SHMOD library.
IEEE-Cluster-Computing-2004-paper
PDF article describing a large run with the PPM code implemented
in the SHMOD framework.
Older Downloads:
SHMOD PDF
document describing the Shared Memory on Disk (SHMOD-1) parallel
programming framework.
SHMOD Version 1
(MPI+RIO) Source
shmod1.tar
This program is a
template 3D parallel code based on PPMLIB which is based on the concepts
described in the above paper. It is suitable for a Linux cluster
using MPI control communication and Myrinet bulk communication.
SHMOD
Version 2 (RIO + Database) README
template.tar
This program is an template 3D parallel code using PPMLIB and an
improved version of SHMOD.