Banner

Using GPFS

Do not have all (or too many) files in the same directory

The GPFS architecture is good at processing parallel I/O from many nodes in general. However, it is very slow when different nodes try to write to exactly the same area of the same file.

The general rule is to avoid having hundreds or thousands of tasks trying to modify the same file/directory at the same time with certain operations.

This happens for instance when, on job start, all participating nodes try to create a file each in one and the same directory. A directory is nothing but a file as well. The rate at which files are being created that way was seen to be about 1/s !

It is strongly recommended to not do this for any larger job. As as better alternative, the files for the individual tasks can be created all by one task. This is faster by several orders at job start (see below).
If the nodes need to create their files indeed themselves, then do create subdirectories first, either one for each tasks or one for a (small) subset of tasks, and let then the tasks create their files within these subdirectories. The subdirectory creation should again be done just by one task.

The code using MPI should do something like this pseudo-code:

!# serial creation
barrier
if (task==0) then
do i=0,nprocs-1
create subdirectory(i)
create file(i) // with optional truncate option
enddo
endif

!# all files created now
barrier

!#parallel usage
open file(myid)
open file(commonfile_id)
...
write privatefile
write commonfile

The tasks can then proceed to modifying their own portions of a common file, with best results if their regions do not overlap on a granularity smaller than the GPFS blocksize (8 MB). For fine grain updates that are smaller than the blocksize, the MPI-IO package is advised since it will use MPI to ship around the small updates to nodes that manage different regions of the file.

GPFS documentation

Is available on the IBM web site: