A DAta-Parallel Programming Library for Education (DAPPLE)

David Kotz
Department of Computer Science
Dartmouth College
Hanover, NH 03755-3510

Abstract: In the context of our overall goal to bring the concepts of parallel computing into the undergraduate curriculum, we set out to find a parallel-programming language for student use. To make it accessible to students at all levels, and to be independent of any particular hardware platform, we chose to design our own language, based on a data-parallel model and on C++. The result, DAPPLE, is a C++ class library designed to provide the illusion of a data-parallel programming language on conventional hardware and with conventional compilers. DAPPLE defines Vectors and Matrices as basic classes, with all the usual C++ operators overloaded to provide elementwise arithmetic. In addition, DAPPLE provides typical data-parallel operations like scans, permutations, and reductions. Finally, DAPPLE provides a parallel if-then-else statement to restrict the scope of the above operations to partial vectors or matrices.

This research was supported under grant DUE-9352796 by the National Science Foundation ILI-LLD program.

This paper appeared in SIGCSE '95, pages 76-81, with an ACM copyright notice. [details]


Contents:


Introduction

Parallel computing, having been considered an advanced topic suitable only for graduate students, is slowly migrating into the undergraduate curriculum [Mil94]. We believe parallelism should be introduced early in the curriculum, before the habits of sequential thinking are ingrained. Indeed, we are preparing to teach it to freshmen in CS2 [JKM94]. We use a data-parallel programming model, whose single thread of control allows students to explore issues in parallel algorithms without the complexities of asynchrony, deadlock, and communication. (While these are important issues in parallel computing, we feel that it is best to allow the students to focus on the underlying parallelism first, and to postpone these other issues to a later course.)

We wanted a programming language that allowed students to experiment with parallel computing concepts without being distracted by the mechanics of parallel programming. In addition, we wanted a parallel programming language that was essentially the same as the language used by students for their sequential programming (preferably C++), was available on the computers they use, was easy to learn by beginners, and was usable by students at all levels in many kinds of courses. Although many data-parallel languages exist, including C*, Fortran90, NESL [Ble93], and HPF [Lov93], they are difficult to use, are not similar to C++, or are not easily portable to student computers.

We found many research projects designing parallel C++ variants. C** [LRV92] is perhaps the closest candidate, in that it supports a data-parallel model, but it requires a new compiler and is not yet available. pC++ [BBG+93] can also provide a data-parallel model, using only a preprocessor and library, but its syntax is a little complicated for beginners. Other data-parallel options like Presto++ [Kil92] and Compositional C++ [CK92] are also rather complex for beginners. Others, like Mentat [Gri93], CHARM++ [KK93], and COOL [CGH94], are more task-parallel than data-parallel.

Finding no suitable existing language, we decided to design and implement our own language as a set of macros and classes that extended C++. The result is DAPPLE, a DAta-Parallel Programming Library for Education. DAPPLE gains its strength from its simplicity, portability, and versatility, rather than from performance or ease of implementation on real parallel hardware. In other words, DAPPLE was optimized for pedagogical use.

In this paper, after a quick review of the data-parallel programming model, we give an overview of DAPPLE through three programming examples. A more detailed introduction to DAPPLE is available.

Data-parallel Programming

The data-parallel programming model gives the programmer a single thread of control, much as in sequential programming languages, but allows certain operations to be applied to large collections of data simultaneously. For example, the sum of two arrays may be assigned to a third array by using many virtual processors in parallel, each responsible for computing one (scalar) sum and storing it in the appropriate element of the result array.

When the condition expression of an if() statement refers to collections, the expression is independently evaluated by every virtual processor. Those virtual processors where the condition is true execute the ``then'' clause (simultaneously), and those where the condition is false execute the ``else'' clause (simultaneously). Within each clause, only a subset of the processors are active, and only active processors participate in operations on collections. In other words, a parallel if() reduces the context of collection operations within each clause. Finally, there are other operations on entire collections, such as reducing a collection to a scalar by summing all the elements, or printing the collection.

DAPPLE Programming

DAPPLE adds data-parallel concepts to C++ programming, allowing the programmer to manipulate collections of data (vectors and matrices) as described above. To illustrate these concepts and the language, we present three examples.

Pascal's triangle

Pascal's triangle is a set of rows, where the first row contains one ``1'' followed by an infinite number of ``0''s. Each entry in the next row is the sum of the entry above it and the entry above and to the left. Inductively, row i has i non-zero entries. The result (one row per line, not showing the zeros) is

1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 and so forth.
Figure 1 shows part of a DAPPLE program to compute Pascal's triangle. The second statement defines an integer vector called arow, with N elements numbered 0, 1, ..., N-1. (DAPPLE supports new classes intVector, charVector, floatVector, doubleVector, and booleanVector). [*] This vector will soon contain one row of the triangle, but for now the elements are uninitialized. Vectors may also be initialized when defined, to a scalar, an array, another vector, or a function of the index. The third and fourth statements of Figure 1 define an N-element integer vector called VP, initialized so that element i has value i.

Figure 1 uses a parallel-if statement, ifp(), to initialize arow (for comparison, it also presents the equivalent sequential code). The ``then'' clause executes only for those virtual processors where the condition (VP == 0) is true, in this case, only virtual processor 0. Thus, it assigns and prints only arow[0]. This one element is of course the entire first row of Pascal's triangle. The ``else'' clause executes for the remaining virtual processors.

The for loop of Figure 1 computes and prints N-1 more rows. Each time through the loop we compute a new row of the triangle, in parallel, by adding the current row to itself, shifted one to the right (a zero is shifted in at the left side). [*] Then, we print out the vector, but only elements 0 through i, i.e., the non-zero elements of this row.

Matrix-matrix multiply

In addition to vectors, DAPPLE supports a set of Matrix classes. Figure 2 shows most of a program to multiply two integer matrices. [*] Three matrices are defined as type intMatrix(r,c), where integers r and c specify the number of rows and columns. Note that A and B are initialized from user input using the standard iostream operator >>, overloaded by DAPPLE for matrix (or vector) input.

A nested loop computes each element of the result matrix C as an inner product (dot product) of the appropriate row of A and the appropriate column of B, demonstrating DAPPLE's capability to work with matrix slices [LRV92]. Here, A[r][_] is a row slice, representing row r of matrix A, and B[_][c] is a column slice, representing column c of matrix B. Slices may be used anywhere vectors may be used, including on the left-hand side of an assignment operator.

The function inner(v1, v2) is provided by DAPPLE, but the same operation could also be expressed as sum(v1*v2), using DAPPLE's built-in reduction function called sum().

The final if() statement demonstrates a handy reduction, any(), which returns (scalar) true if and only if some element of its vector or matrix argument is non-zero. Here, its argument is the boolean matrix representing the condition (C != D), so any(C != D) is true if there is any position (i,j) where C[i,j] != D[i,j]. Although one might be tempted to write ifp(C != D) instead, that would have a different effect: the first message would be printed once for every virtual processor where C[i,j] != D[i,j], and the second message would be printed once for every virtual processor where C[i,j] = D[i,j].

Quicksort

To demonstrate DAPPLE's ability to manipulate data within a vector, and in particular its ability to dynamically narrow context to a subset of the virtual processors, we devised a simple recursive implementation of quicksort (Figure 3). [*] The quicksort procedure recursively sorts the active portion of its vector argument. (Initially, quicksort is called with all processors active.) It begins by using the reduction n_active() to find the size of the subvector it is to sort. Then, it dispenses with two special cases: subvectors of size 0 or 1 are trivially sorted, and a subvector of size 2 may require a swap. (We use reductions min_value(), max_value(), and first(), to compute the minimum and maximum values and assign them to the appropriate element.) Otherwise, we partition and recurse. To partition, it chooses a splitter value (here, the value at the first active processor), builds a permutation subvector that specifies the destination of every element in the repartitioned subvector, and then permutes. It restricts the context to the left partition and recurses, and then restricts the context to the right partition and recurses.

The quicksort example demonstrates one weakness of DAPPLE, its inability to support nested data parallelism [Ble93]. The two recursive calls to quicksort() must be done sequentially, each with only a small subset of the virtual processors active. Given this model, other sorting algorithms would be more appropriate. Exploring this issue would be a valuable lesson for students.

Summary and Status

This section has been updated since the SIGCSE paper.

The DAPPLE extensions to C++ are summarized in Table 1. We have used the language in a one-week module of an existing CS 2 course at Dartmouth (complex factors have made it impossible to develop the full course that we had planned [JKM94]). Further information about that experience will be presented at the Wellesley Forum, but the lectures are on-line. DAPPLE should be useful beyond that course, however, in other courses and in other institutions.

Full information about DAPPLE is on the web, including code, documentation, tutorial, and examples. In particular, the code is available for several platforms.

Acknowledgements

Many thanks to all of those who made suggestions about the language or this paper, or helped with subtle points of C++ technique, including Owen Astrachan, Tom Cormen, Fillia Makedon, Takis Metaxas, Nils Nieuwejaar, Sam Rebelsky, Scott Silver, and Cliff Stein.

Bibliography