Published 06/12/2000
Contact: David Hart, SDSC, dhart@sdsc.edu, 858-534-8314
UNIVERSITY OF CALIFORNIA, SAN DIEGO -- The High Performance Computing (HPC) Systems group at the San Diego Supercomputer Center (SDSC) has recently posted the National Partnership for Advanced Computational Infrastructure (NPACI) JOBLOG Job Trace Repository to the Web at http://joblog.npaci.edu. The primary objective in posting the repository for public access is to provide a database of NPACI computer system logs to both NPACI resource users and computer science researchers studying parallel scheduling.
Unlike desktop computers, where users interact continuously with programs, researchers that use parallel supercomputers often run programs as self-contained "jobs," which must be scheduled to run when processors become available. Scheduling jobs on parallel computers can be very complex, considering the multitude of users that need to run their code on the supercomputer and the amount of time required to process the information. Job traces show that current scheduling programs provide great efficiency, but it is essential for researchers to continue improving scheduling programs as supercomputing technologies progress.
"Sharing data regarding system usage with the research community is extremely important, and I only wish more supercomputer installations did the same," said Dror Feitelson, a computer science researcher specializing in parallel job scheduling and performance evaluation at the Hebrew University in Jerusalem. Feitelson, who has developed a comprehensive archive of job traces, said "a major problem that we parallel scheduling researchers face is gaining access to high-quality, comprehensive production logs."
Walfredo Cirne, assistant professor of Computer Science at Brazil's Universidade Federal da Paraiba, agreed with Feitelson. "One of the fundamentals of parallel scheduling research is determining how parallel supercomputers are used, which requires access to supercomputer production logs. Access to data such as that now provided by SDSC can be difficult, often impossible," said Cirne, who is currently working towards his Ph.D. at the University of California, San Diego (UCSD).
Real workload data is very important to scheduling researchers because it enables them to use simulation to evaluate their ideas in a realistic scenario. Simulation also provides researchers with the ability to investigate new ideas without having to change how the supercomputer operates in production. The use of real workloads to drive the simulations assures that the results will hold in practice.
This need for accessible production logs was recognized by SDSC's HPC Systems Manager, Victor Hazlewood, several years ago when he and former SDSC Junior Fellow and intern Allen Downey started generating manual data sets for Downey's dissertation project at the University of California, Berkeley. "These manual job logs that Victor and I developed allowed me to develop new theories and workload models, which allowed me to successfully finish my project," said Downey, now an assistant professor of computer science at Colby College in Maine.
When Hazlewood realized how important the production data sets were to Downey's project, he started to develop the public repository. "Due to the nature of my position at SDSC, I am constantly generating reports regarding SDSC production data ? the same type of data that parallel scheduling researchers need to continue their work. The primary issue with releasing production information for public consumption was maintaining the privacy of our users," Hazlewood said. "Our solution was to encrypt the user identification and account name, but the rest of the job information is available for researchers to use in their continued development of parallel scheduling."
Current topics in parallel scheduling research encompass modeling the memory requirements of parallel jobs so that schedulers can make good decisions about putting multiple processes on a single processor. "The data sets from SDSC were the first to include information about memory use, and NPACI's JOBLOG Job Trace Repository provides the right tools for getting this information about current machines," said Downey.
"This will significantly improve our ability to create real-world solutions for anticipated scheduling problems of the future," Cirne said.
The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego, and the leading-edge site of the NPACI ( http://www.npaci.edu/). SDSC is funded by the National Science Foundation through NPACI and other federal agencies, the State and University of California, and private organizations. For additional information about SDSC and NPACI, see http://www.sdsc.edu/ or contact David Hart, dhart@sdsc.edu, 858-534-8314.