Parallel-I/O Anecdote

I recently developped an out-of-core solve for the Connection Machine CM5 and even though there are no published numbers yet I thought I'd let you know about it in response to your message in comp.parallel (I merely wish to share this experience, not to be quoted as giving hard official numbers for the CM5 parallel I/O system).

I solved a 76800 by 76800 double complex system of equations (using Gaussian elimination with full column pivoting) on a 64 node CM5 with an SDA (Scalable Disk Array) system of 150 disks. In this system each disk had 1Gbytes capacity. The block size I used was 512 hence the total amount of data transfered during factorization was N^3/3b*16=4.72e+12 bytes (N= 76800, b=512 and 16 bytes per data element). The I/O time was 12.6 hours, which means about 100Mbytes/sec or .69Mbytes per disk. The I/O calls are made from the high level CMFortran (F90). The transfer rate per disk can be as high as 1.5Mbytes/disk on read, but this is what I observed on this run. On other runs (N=51200 on 64 node with 64 SDA disks) I got 1Mbytes/sec per disk.

The important thing in my mind is that large SDA systems coupled with a rather small CM5 allowed me to solve very large problems with high performance. The 76800 problem I mentioned above took a total of 94.6 hours (on 64 node cm5 and 150 SDA system) to factor at a rate of 3.55 Gflops (8/3 N^3 operations for double complex data).

Even though improvements are needed (and happening) I think SDA I/O does not get on the way to good performance on the CM5 and seem to provide good scalability.

-- anonymous 1994