SC09 Paper, Source Code, Traces.
Parallel applications running across thousands of processors must protect
themselves from inevitable component failures. Many applications insulate
themselves from failures by checkpointing, a process in which they save their
state to persistent storage. Following a failure, they can resume computation
using this state. For many applications, saving this state into a shared single
file is most convenient. With such an approach, the size of writes are often
small and not aligned with file system boundaries. Unfortunately for these
applications, this preferred data layout results in pathologically poor
performance from the underlying file system which is optimized for large,
aligned writes to non-shared files.
To address this fundamental mismatch, we have developed a parallel
log-structured file system, PLFS, which is positioned between the applications
and the underlying parallel file system. PLFS remaps an application’s write
access pattern to be optimized for the underlying file system. Through testing
on Panasas ActiveScale Storage System and IBM’s General Parallel File System at
Los Alamos National Lab and on Lustre at Pittsburgh Supercomputer Center, we
have seen that this layer of indirection and reorganization can reduce
checkpoint time by up to several orders of magnitude for several important
benchmarks and real applications.
![]() |
This graph summarizes our results which are explained in detail in our SC09 paper. The key observation is that our technique improves checkpoint bandwidths for all seven studied benchmarks and applications by up to several orders of magnitude. |
We expect that PLFS can improve the checkpoint bandwidth for any large parallel
application that writes to a single file. The expected improvement is
especially large for those applications doing unaligned or random IO, patterns
which have become increasingly prevalent recently due to the wide-spread
adoption of complex formatting libraries such as NetCDF and HDF5.
Available as: SC09 Paper, Source Code, Traces.
Los Alamos National Laboratory
Carnegie Mellon University
Pittsburgh Supercomputing Center