SHMOD

This introduction to SHMOD describes a framework and programming style appropriate for a wide range of parallel computational tasks. If an algorithm can be expressed as a list of independant tasks, this toolkit can help implement the solution in a fault-tolerant, dynamic and efficient way. The required network, I/O and data-base functions are supplied in a user library. A very simple example ("hello_shmod") is provided to introduce the user library. A fully functional example is also provided illustrating one way to acheive fault-tolerance, load-balancing and dynamic performance.

Requirements

C Source is provided under the GNU Public license. A generic Linux and Windows makefile are supplied, though they may need modification for a specific installation. The mySQL database library is required to build SHMOD; see mySQLfor the latest release.

Overview

There are three major classes of functions in SHMOD.

0) Initialization and shutdown.

a) start_thing( sqlhostname, sqluser, sqlpw, sqldatabasename )
b) stop_thing()

1) Client-Server (really peer-peer) bulk high-bandwidth non-blocking remote I/O
a) create_thing( name, hostname, hostdirectory, size )
b) read_thing( name, localaddress, nbytes, remoteoffset, &iohandle )
c) write_thing( name, localaddress, nbytes, remoteoffset, &iohandle)
d) wait_thing( iohandle, waittime )

2) Global database table access to locks, and get/put of small amounts of global data.

a) getglob_thing( name, localaddress, nbytes, &timestamp )
b) putglob_thing( name, localaddress, nbytes )
c) lock_thing( lockname )
d) unlock_thing( lockname )

The SHMOD API is layered on a supporting database API and remote I/O facility. The SHMOD remote I/O server is supplied for TCP/IP sockets (ipriod) - an older, Myrinet/GM version (sriod) is also supplied but not required. In the near future, a third alternative for network communication will be supported which will be appropriate for high-bandwidth transfers over the TeraGrid. In the figure below, only one instance of a user's application and a remote I/O server are shown. In use on a parallel system, there would be likely many of both.

Hello World

This simple complete example is in Fortran. Both Fortran and C language 'bindings' are provided in SHMOD. See downloads for a link to the following annotated source in a tarball. This program does not overlap SHMOD reads or writes of data contexts, nor does it recover from workers acquiring work assignments and never completing them. It does, however, illustrate a structure for cooperative update of N tasks by M workers, for any M<=N. In the interests of simplicity, error handling is rudimentary.



!----------------------------------------------------------------

! .... control data ....

integer work

common /cwork/ work(N) 



! .... context data .....

common /cdata/ data(S)



! ... repetitious declarations

include "shmod.f"

character(16) nameof

external nameof



!----------------------------------------------------------------



Program hello_shmod



! compilation for Intel Linux: efc -FR -w95 -fpp1 helloshmod.f

! compilation for Intel windows: efl /FR /w95 /fpp1 helloshmod.f



! ... Number of pieces of work

#define N 64



! ... maximum number of workers expected

#define M 64



! ... Size in words of the context data for a piece of work

#define S 1000



! ... Single remote I/O server which serves contexts to all workers.

#define SERVERHOST "frodo.lcse.umn.edu"

#define SERVERROOT "e:/temp"



include 'common.f'



! ----------------------------------------------------------------

! ... Connect to a mySQL server as user 'lcse', pw 'demo' and use the

!    database 'demodb'.



ierr= start_thing( "frodo.lcse.umn.edu", "lcse", "demo", "demodb" )

if ( ierr.ne.0 ) call handleit("Cannot start")



! ... work loop



do while (.true.)



    ! ... Attempt to read the control block. If it is
zero length,

    !    instead initialize it.



    if ( lock_thing("alock") .eq. 0 ) call
handleit("Cannot lock")



! for dbg, always init

!    ierr= getglob_thing( "work", work, sizeof(work),
now )

    ierr= 0



    if ( ierr .ne. sizeof(work) ) then

       call initialize

       print *, "Initializing"

       ierr=
putglob_thing("work",work,sizeof(work) )

       if ( ierr .ne. sizeof(work) ) then

          call
handleit("Cannot initialize work glob")

       endif

    end if



! ...OK, what should I do next based on the 'work' assignment array?

    myjob= nextwork()

    if ( myjob .lt. 0 ) then

       print *, "No work left, exiting"

       call stop_thing

       stop

    endif



! ... mark as assigned

    mystate= work(myjob)

    work(myjob) = -work(myjob)



! ... Store the updated job info and unlock the table.

    if ( putglob_thing( "work",work,sizeof(work) ) .ne.
sizeof(work) ) then

       call handleit("Cannot putglob")

    endif

    if ( unlock_thing("alock") .eq. 0 ) call
handleit("Cannot unlock")



    print *, "doing work piece ", myjob, " state ",
mystate

          

    if ( mystate .eq. 1 ) then

       call initializework(myjob)

    else

!     ... Retrieve the data context required for
the task  

      if ( fetch(myjob) .ne. 0 ) call
handleit("Cannot fetch")

    endif

  

! ... Perform the computation on the context

    call dowork(myjob)



! ... Write it back to whence it came

    if ( writeback(myjob) .ne.0 ) call handleit("Cannot
writeback")



 end do



call stop_thing

stop

end



! -----------------------------------------------------------------

! ... find next piece of work to do (if any).  If work(i) is
negative,

!     it is currently assigned.  Return -1 if
there is no work to 

!     assign.



integer function nextwork()

include 'common.f'



do i= 1, N

   if ( work(i) .gt. 0 ) then

    nextwork= i

    return

   endif

enddo



nextwork= -1

return

end



! -----------------------------------------------------------------

! ... return character string which is the SHMOD 'name' of the single

! ... data context needed for this piece of work.



character(16) function nameof(mywork)

write(nameof,"('context',I4.4)" ) mywork

return

end



! -----------------------------------------------------------------

subroutine initialize

include 'common.f'



! ... Set the work status array indicating all tasks are in state '1'.

! ... There are N tasks, each independent.



do i= 1, N

    work(i)= 1

enddo



return

end



! -----------------------------------------------------------------

subroutine initializework(myjob)

include 'common.f'



do i = 1, S

   data(i)= 0

enddo



ierr= create_thing( nameof(myjob), SERVERHOST, SERVERROOT, 0 )

if ( ierr .ne. 0 ) then

    call handleit("Cannot create")

endif



return

end





! -----------------------------------------------------------------

subroutine dowork(myjob)

include 'common.f'



!     ... computational work, such as it is

    do i = 1, S

       data(i) = data(i) + 1

    enddo

      

return

end



! -----------------------------------------------------------------

subroutine fetch(myjob)

include 'common.f'



if ( read_thing( nameof(myjob),data,sizeof(data),0,iostat ) .ne. 0 )
then

    call handleit("Cannot read")

endif



! ... wait for the data read.

ierr= wait_thing( iostat, -1 )

if ( ierr .ne. sizeof(data) ) then

   print *, ierr

   call handleit("cannot wait?  Read failure")

endif



return

end



! -----------------------------------------------------------------

subroutine writeback(myjob)

include 'common.f'



if ( write_thing( nameof(myjob),data,sizeof(data),0,iostat ) .ne. 0 )
then

    call handleit("Cannot write")

endif



! ... wait for the data writeback.

if ( wait_thing( iostat, -1 ) .ne. sizeof(data) ) call handleit("Cannot
wait")



! ... now we're sure the job is done, update the database.

if ( lock_thing("key") .eq. 0 ) call handleit("Cannot lock")

if ( getglob_thing( "work", work, sizeof(work), now ) .ne. sizeof(work)
) then

   call handleit("Cannot getglob")

endif



work(myjob) = work(myjob)+1

if ( putglob_thing( "work", work, sizeof(work) ) .ne. sizeof(work) )
then

    call handleit("Cannot putglob")

endif



if ( unlock_thing("key").ne.0 ) call handleit("Cannot unlock")



return

end



! -----------------------------------------------------------------

! ... All errors are fatal and uninformative, again in the interest

! ... of brevity.  A real application need only say 'why' it is
exiting,

! ... no other SHMOD participants will be affected should one stop.



subroutine handleit(msg)

character(*) msg

print *, "A very bad thing has happened.   ",msg

stop

end

Full Example

For the real details on how to accomplish fault tolerance and overlap context I/O with compuation, see the code: shppm.tar.gz This section will sketch the structure that must be implemented to realize the full benefit of SHMOD.

For fault tolerance, the application must become a bit more clever about what work assignment it can take for itself. When the list of work is acquired, it will include information about previously assigned tasks and workers, such as how long the task required last time it was performed, and to which worker it is currently assigned. The "lookforwork" routine can then decide to reclaim a previously assigned task for itself.

Overlapping context I/O is also fairly easy. The application should 'take' three work assignments; one to pre-fetch, one to update, and one which can be in the write-back phase. Once the 'pipeline' gets running, being able to read & write a data context in the time it takes to computationally update a third will allow complete overlap of requisite communication and calculation.

Here is an overview of the structure:

Current Downloads

shmod3.tar.gz Source for SHMOD user programming API
hello_shmod.tar.gz Source for simple SHMOD example explained in this text
shppm.tar.gz Source for complete sPPM application and driver

SHMOD2 PDF document describing the updated SHMOD library.
IEEE-Cluster-Computing-2004-paper
PDF article describing a large run with the PPM code implemented in the SHMOD framework.

Older Downloads:

SHMOD PDF document describing the Shared Memory on Disk (SHMOD-1) parallel programming framework.

SHMOD Version 1 (MPI+RIO) Source
shmod1.tar
This program is a template 3D parallel code based on PPMLIB which is based on the concepts described in the above paper. It is suitable for a Linux cluster using MPI control communication and Myrinet bulk communication.

SHMOD Version 2 (RIO + Database) README
template.tar
This program is an template 3D parallel code using PPMLIB and an improved version of SHMOD.

Shared Memory Object Library

A Framework for Cluster and Grid Computing

Primer, Examples and Programming API

Version 3, June 1, 2003

Sarah Anderson, David Porter and Paul R. Woodward {sea,dhp,paul}@lcse.umn.edu

Digital Technology Center, 499 Walter Library, 117 Pleasant Street S.E. , Minneapolis, Minnesota 55455