PROGRAMMING TOOLS AND ENVIRONMENTS | Contents | Next

Compiler Projects Help Overcome Challenges
of Parallel Applications

ompilers are probably the most important tools in the programmer's toolbox. A compiler translates the more human-friendly words and symbols of programming languages such as Fortran or C--two of the more common languages for scientific applications--into the binary code that computer processors understand. In the world of high-performance computing, parallel computers, with hundreds or even thousands of processors, present a challenge to a programmer. Most of the time, the programmer must painstakingly write detailed instructions to manage all the processors. Several compiler projects in NPACI's Parallel Tools and Environments thrust area, however, are facing this parallel programming challenge head on.

The most common tool for writing parallel programs is the Message Passing Interface (MPI). MPI provides instructions for exchanging messages and data between processors, but the programmer is responsible for keeping track of what information needs to go where. By adding different language constructs or analyzing a program's structures, compilers have the potential to ease the burden on the programmer.

P-LANGUAGES

TITANIUM

OUT OF CORE I/O

ADR COMPILER

Figure 1. Molecular Dynamics with PFortran

EulerGROMOS, a molecular dynamics program developed using Pfortran, has been used to simulate acetylcholinesterase in water with 130,000 atoms. At the time, this represented the largest ever molecular dynamics simulation of a biological system. The work is described in the Journal of the American Chemical Society, 119(40): 9513-9522 (1997).

P-LANGUAGES

"There are different approaches to parallel languages," said Ridgway Scott, professor of computer science at the University of Chicago. "There is as much diversity in parallelism as there is in normal programming languages. This leads to a matrix of possibilities and degrees of parallelism support ranging from explicit to implicit."

Scott is leading the P-languages project, which extends a serial language with a set of parallel constructs. In the world of parallel applications, the P-languages are best suited for the category of applications that have very irregular, or "gridless," structures in their underlying representations.

"For example, molecules move around randomly, leaving no natural mesh to compute on," Scott said. "Molecular science provides us good test cases, and we're familiar with them." Scott also leads a project on Enhanced Imaging of Biological Structures from NPACI's Molecular Science thrust area.

The P-language extension is an explicitly parallel language, but the sends and receives of MPI are fused together. "The compiler handles the drudgery," Scott said. "It's a more elegant solution that helps avoid errors." The extensions are also very similar regardless of the language being extended. Version 2.0 of the P-languages software was recently installed on NPACI's IBM SP and CRAY T3E at SDSC, becoming the first tool from the Programming Tools and Environments thrust area to be installed for production use.

Within the thrust area, Scott's group is also working with KeLP and Meta-Chaos. KeLP and the P-languages offer complementary parallel programming techniques that don't interfere with one another. KeLP provides automation at a higher level and is designed for problems that can be expressed in terms of grids. Meta-Chaos provides the framework for allowing a program to use KeLP and the P-languages simultaneously.

The main application focus for the P-languages within NPACI is in molecular science. With researchers at the University of Houston, the Baylor College of Medicine, and UC San Diego, Scott's team is working toward atomic-level reconstruction of molecular structures. "We've seen in our collaboration with molecular scientists that, as they try larger problems, they also want to try new algorithms to look at new features and at greater detail."

Top| Contents | Next

P-LANGUAGES
PROJECT LEADER
Ridgway Scott, University of Chicago

PARTICIPANTS
Terry Clark, Ernesto Gomez
University of Chicago
Jian Zhang, University of Houston

COLLABORATIONS
Programming Tools and Environments
KeLP
Meta-Chaos
Molecular Science
Enhanced Imaging of Biological Structures

TITANIUM

Titanium, a compiler project from UC Berkeley led by Susan Graham, professor of computer science and NPACI's chief computer scientist, is tackling the problem of optimizing parallel code. Titanium is an explicitly parallel object-oriented language based on Java that is best suited for grid-based computations, in particular "adaptive mesh refinement" problems common in engineering applications.

"Our compiler does quite a bit of analysis on the parallelism constructs to optimize both communication and the serial portions of the code," said Katherine Yelick, associate professor of computer science at UC Berkeley. "This is one of the advantages of using parallel language, rather than a library of parallelism constructs such as MPI. When compiling MPI programs, the compiler has no information about the library calls and cannot optimize them or the code around them."

The Titanium compiler has analyses and optimizations for synchronization, access to shared variables, re-use of remote values, caches, and overlapping communication. In addition, the group is working on better support for dynamic data structures and alternatives to garbage collection for automatic memory management. Titanium currently runs on the Network of Workstations at UC Berkeley and SMP systems. The group is also planning ports to the CRAY T3E, the Tera MTA, and the IBM SP.

The most active users of Titanium are in Phillip Colella's group at the National Energy Research Scientific Computing Center, an NPACI affiliate partner. Colella's team is using Titanium for computational fluid dynamics algorithms. The Titanium team is also working with UC Berkeley astrophysicists Richard Klein and Chris McKee on the use of Titanium in simulations of star formation.

In the Programming Tools and Environments thrust area, the Titanium group is also collaborating with the KeLP project. They are comparing the performance of applications written in KeLP against those written in Titanium, as well as considering possibilities such as rewriting KeLP in Titanium and calling KeLP from Titanium.

"Both Titanium and KeLP address a similar application domain," Yelick said. "The KeLP group has built up a rich set of abstractions for problems such as load balancing adaptive meshes, while Titanium performs some platform-specific optimizations. We're looking at how to take the best features from both."

Top| Contents | Next

TITANIUM
PROJECT LEADER
Susan Graham, UC Berkeley
NPACI Chief Computer Scientist

PARTICIPANTS
Kathy Yelick, Greg Balls
UC Berkeley

COLLABORATIONS
Programming Tools and Environments
KeLP

OUT OF CORE I/O

In most applications, the program performs computations on data structures stored in a computer's memory, its RAM. But as scientists tackle larger and larger problems, they are faced with applications that must deal with data structures that are too large to be stored in memory. This situation is called "out-of-core I/O," where core is an antiquated term for main memory and I/O is shorthand for input and output.

"In most of these cases, the data movement must be written by hand," said Ken Kennedy, director of the Center for Research on Parallel Computation at Rice University and member of the President's Information Technology Advisory Committee. "These applications are therefore difficult to build."

Kennedy is leading an NPACI project to develop a compiler that handles out-of-core I/O situations automatically. The project is based on the High Performance Fortran (HPF) compiler infrastructure, developed in part by John Mellor-Crummey, Vikram Adve, and Rob Fowler at CRPC.

"HPF can distribute large arrays across processors in blocks, and the compiler generates the code to move the data," Kennedy said. "From there, it was a small step to consider managing even larger arrays that had to be stored on disk. The goal of our NPACI project is to build such a compiler."

To make these applications run efficiently, the project is pursuing three main techniques: applying previous research in restructuring the application to use data in memory as much as possible; using explicit memory movement instead of virtual memory; and pre-fetching data far enough in advance to give it time to move from disk.

NPACI is funding work on experiments to apply the compiler to real-world applications. One of the first test cases will be provided by the University of Maryland's archive of satellite imagery.

Top| Contents | Next

OUT OF CORE I/O
PROJECT LEADER
Ken Kennedy, Center for Research on Parallel Computation, Rice University

PARTICIPANTS
Bradley Broom, Rice University

COLLABORATIONS
Earth Systems Science
Quantitative Geography for Ground Truth

ADR COMPILER

The archive of satellite imagery, as well as archives of bay and reservoir simulation data, microscopy data, and material science data, will be made more accessible by the fourth compiler project in the thrust area. The Active Data Repository (ADR) project at the University of Maryland and Johns Hopkins University supports spatial queries and customized processing of such very large data sets, in which the data describes measurements or simulated quantities in some multidimensional space.

"Scientists might want to perform simple calculations such as time averaging over a geographic area to look for the clearest day or the cloudiest day," said Joel Saltz, leader of the ADR project and NPACI's Programming Tools and Environments thrust area. "That's what the ADR does, but right now much of that has to be specified by a programmer in a relatively cumbersome manner. Just as a parallel language makes it possible to write data parallel applications, we're aiming to do the same for databases."

Currently, those who maintain data sets have to put a significant effort into customizing ADR to apply to their data set. The compiler project is designed to reduce the effort required to allow ADR to support customized spatial queries and processing for a new data set. The compiler would let programs use an extension of SQL3.

Links to other technology thrust area projects include Globus, Meta-Chaos, KeLP, and the SDSC Storage Resource Broker (SRB). Data can be obtained from ADR using Meta-Chaos layered on top of Globus. KeLP-supported programs are being coupled to ADR, ADR will soon target HPSS as well as disk caches, and ADR queries and procedures will be available through a SRB interface.

"In essence, we are providing a object-relational database compiler front end to the ADR in addition to the manual libraries for ADR access," Saltz said. "With NPACI and SDSC's Storage Resource Broker software, we will be able to offer much easier access to many very large data sets." --DH

Top| Contents | Next

ADR COMPILER
PROJECT LEADER
Joel Saltz, Johns Hopkins University, University of Maryland

PARTICIPANTS
Chialin Chang, Renato Ferreira, Tahsin Kurc, Alan Sussman, University of Maryland

COLLABORATIONS
Programming Tools and Environments
Active Data Repository
KeLP
Meta-Chaos
Metasystems
Globus
Data-intensive Computing
Data-handling Systems

Top| Contents | Next

Compiler Projects Help Overcome Challenges of Parallel Applications

Figure 1. Molecular Dynamics with PFortran

P-LANGUAGES

TITANIUM

OUT OF CORE I/O

ADR COMPILER

Compiler Projects Help Overcome Challenges
of Parallel Applications