One nice, and unintended, feature of PLFS is that the index files, when augmented with timestamp information, are effectively a write trace for the logical file. Each index file contains a set of records for each write to the logical file; each record contains a chunk ID which has a one-to-one mapping to an MPI rank, an offset, a length, and a start time and an end time. These "traces" can then be used to replay the IO patterns either using a replayer on a real system or in a simulation. Another purpose is that these IO patterns can be visualized and analyzed. Below is a set of PLFS index files as well as a set of visualizations for a variety of different workloads. The visualization tool used was developed at Argonne National Labs by Rob Latham and Rob Ross and is apparently called "Clever Name". If you have any questions about these traces, please consult the FAQ at the bottom of the page; also, please feel free to contact us if needed.
| Checkpoint | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, |
| Plot w/out corners | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, |
| Plot w/ corners | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, | map, svg, pdf, |
| 10MB | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf | trace, svg, pdf |
| 47K | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg | trace, svg |
Those other traces (http://institutes.lanl.gov/data/tdata/) were collected using ltrace on each MPI rank in a job. They contain all the MPI calls as well as all the IO related system calls. They contain trace information for every file touched in the job. These traces are for just the writes for one single file and are collected at the VFS layer using FUSE, which introduces some limitations. FUSE breaks large IO's into 128K page-aligned chunks so if an application does non-page-aligned IO or IO's larger than 128K, you will see multiple IO's in the trace file and there is no way to know what the application actually did. Also, as an artifact of being index files designed for read lookups, any overlapped IO's (i.e. multiple writes to the same offset) will not appear in the trace file.
Well, the pictures are pretty and the traces are really easy to get and we don't even have to anonymize them. Also, the other traces can have really bad timing overheads whereas for N-1 files (i.e. N procs write to 1 file), PLFS speeds it way up so these traces are actually gathered with negative timing overheads!
Not quite. It doesn't add any timing overhead but it does add some timing artifacts. For example, take a look at LBNL's PatternIO benchmark for 10MB IO's and 128 processes (trace, svg, pdf). Notice those crazy "staircase" looking patterns where there is a section with mostly vertical lines and then a section with long horizontal ones? Since the y-axis is offset and the x-axis is time, vertical lines are fast writes, and horizontal lines are slow ones. Those slow writes (i.e. the horizontal lines) are caused by the buffer cache filling and getting flushed. They are synchronized due to an interaction between PLFS and our underlying storage system. PLFS breaks an N-1 file into N files on the underlying storage system. That underlying storage system has a configurable value for the size of its buffer cache which was set at 512MB when these traces were collected. Since there are 8 procs per node on this system, this means that every proc can basically write quickly for 64MB's at a time.
Please see the answer to the previous question.
Each mark in the graph is a write. Different colors are for different processes. The y-axis is the file offset and the x-axis is the relative time. So the height of a mark is the size of the write and the length of the mark is the duration of the write. Horizontal lines are slow writes, vertical ones are fast writes.
There isn't but I can't think why you'd need one. There is an exact one-to-one mapping, we just don't know what it is. If you're really desperate for a function to convert from chunk-id to rank, you can try: f(x) = x.
These traces are all from system data machine number 26 on this computer systems table.
Nothing.
The picture is an imperfect representation; for complete accuracy check the trace. But in general, real applications do sometimes do this but the LANL mpi_io_test never does. There are some graphs for the LANL mpi_io_test that make it appear as if processes are seeking backwards (e.g. nonstrided 64 PE: trace, svg, pdf), but I believe this is due to the graphics package running out of colors.
You can see one here. There's a self-describing header.
Sure. But be careful; that much money could avalanche and kill you.
Please just cite this webpage.