Parallel-I/O Anecdote

Here is what our current models look like, and what we'd like to do with parallel models:

A "simple model" has the following structure (where typically the output time step is 1 hour, the model time step is 15 minutes (dependent upon cell size) , and each science process module takes as its inputs the current simulation time, the model time step, and the current instantaneous concentration field C (subscripted by grid column, row, level, and chemical species number), and produces the rate of change for C relative to that specific process for that time step). Each science module is responsible for interpolating other needed variables (e.g., meteorology or emissions) from external data files, usually using the INTERP3 routine. Note that an individual datum may be requested many times, that not all data in a file need be read, and that accesses do not necessarily occur in consecutive order. This mandates (or argues strongly for) direct access rather than sequential files. Data portability and network transparency mandate C files rather than Fortran files, while the computational modeling is more easily stated in Fortran. So we are in a mixed-language programming situation.

	INITIALIZE CONCENTRATION FIELD

	LOOP ON OUTPUT TIME STEPS

	    LOOP ON "MODEL TIME STEPS"

		SEQUENCE OF SCIENCE PROCESSES

	    WRITE OUTPUT CONCENTRATION FIELD

	SHUT DOWN MODEL

There is a considerable amount of data parallelism shared among the science processes. All of the processes are parallel in horizontal row and column; the only communication/data sharing required in the horizontal is nearest neighbor communication during the advection process. Load balancing is a consideration for both the chemistry and the cloud microphysics processes, which vary greatly from horizontal cell to horizontal cell in the amount of work to be performed.

Only the "gridded" file types are actually used in our models at present. Ideally, if the model is being implemented on a data parallel platform, the I/O API should respect the data parallelism of the rest of the model.

Consider the following framework for doing air quality modeling for a two way nested family of grids (possibly several levels deep): Let C be a coarse grid model in which models F are nested fine grids. Let the output time step of the F be the same as the model time step of C. Extend the functionality of READ3() and INTERP3() so that they block for data not yet available, and let "nest interaction be a science process for C. It should read the fine grid data from the Fs, aggregate it over the coarse grid, and then write out boundary conditions for the Fs.

Each of C and the various Fs is a separate process in its own right. Scheduling is performed by the operating system, on the basis of data availability and the blocking property of the reads. The separate processes provide encapsulation for the grids, and prevent name-space pollution for all the various copies of science processes acting on different grids, while allowing grid dimensions (etc.) to be compile time constants. (I am presuming for example, that there is *one* copy of code for the convective mixing module, with grid geometry and dimensionality customized at compile time for each of the grids in the nest system by parameters in an INCLUDE file. That gives the compiler extra information useful for optimization (especially on parallel machines), while the separate processes provide independent name spaces for the linker.)

This framework for doing nested modeling would have a number of advantages. I feel (personally) that it gives better encapsulations for good software engineering. The overall complexity is lower. It does not require huge real memories in which to run (it can take advantage of the swapper even on real-memory machines like Crays). It provides better opportunities for reuse, since the nest geometry and which models are running one-way and which are running two-way can be determined at initialization time. On the other hand, it does require tools for implementing the kind of stateless access specified by the I/O API.

-- anonymous 1993