Skip to content

 

METASYSTEMS | Contents | Next

A Good Climate for Distributed Computing

PROJECT LEADER
Rich Wolski, Research Scientist, Department of Computer Science and Engineering, UC San Diego

Sometimes the road is clear and traffic zips along, sometimes the roads are flooded and traffic is jammed, and sometimes there are scattered bursts of data. Such is the weather on the Internet. To track the flow of network traffic and try to predict whether conditions are clear or clogged at any given time, UC San Diego computer scientist Rich Wolski has created the Network Weather Service (NWS). NWS predicts performance for networked systems, making this information available to automated schedulers and to humans who want to gauge the network's quality of service.

Heterogeneous networked systems, by definition, include computers of various types, capabilities, and resources, but the capabilities and resources change in response to many external factors. Wide-area distributed processing works best for the most accessible computers, and accessibility depends on such factors as data transfer rates on network links and the amount of traffic between sites. How can a program determine the "best" schedule in a dynamic environment?

WHAT'S THE FORECAST?

UNOBTRUSIVE WEATHER STATIONS

MORE THAN TALKING ABOUT THE WEATHER

HERE, THERE, AND EVERYWARE

WHAT'S THE FORECAST?

Instead of temperature and barometric pressure, NWS bases its predictions on readings from sensors--network monitors, CPU monitors, and so on--distributed across the computers on the network. NWS then uses numerical models to generate forecasts of what the conditions will be for a given time frame.

It can dynamically choose the best of these forecasting models by comparing them to recent system behavior. It tracks the accuracy of the models' predictions, and uses the model with the lowest cumulative error at any given moment to generate a forecast.

"All scheduling decisions are based on predictions of the underlying resource performance," Wolski said. "The Network Weather Service provides an automated way to make such predictions dynamically, and then to make them available for scheduling purposes."


nws-689(outlines)Figure 1: Measuring Throughput on the vBNS

A graph generated by the Network Weather Service of throughput on the vBNS high-speed network between a host at SDSC and one at the University of Illinois.

UNOBTRUSIVE WEATHER STATIONS

NWS is used today in NPACI to track and monitor end-to-end performance on the vBNS, the high-performance research network that some regard as the successor to today's Internet. NWS operates a set of sensors that correspond to "weather stations," from which it gathers readings of the current conditions. It relies on reports in real time from the computers on the network and from dedicated monitors on the network itself. The monitors typically take performance measurements once a minute.

Wolski and researchers in UC San Diego's Heterogeneous Computing Group have attempted to engineer the system to be non-intrusive. Resident software requires a tiny fraction of CPU power--so little that it's difficult to measure accurately. The code requires less than 3.3 MB of memory to run (less than some PC e-mail programs need) and occupies less than 20 MB of disk storage. "We recognize our status as guests, and we adhere to the laws and customs of local computer fiefdoms in every way we can," said Neil Spring, one of the researchers on the project.

MORE THAN TALKING ABOUT THE WEATHER

NWS was developed for dynamic schedulers and to provide Quality-of-Service readings on a network. The AppLeS scheduling methodology makes extensive use of its facilities, and was the first software system to integrate it (p. 12). Over the next year, the UC San Diego researchers will more tightly integrate NWS monitoring and forecasting facilities into AppLeS.

Wolski's group is also developing NWS implementations for the Legion and Globus metasystems. Each prototype forecasts the process-to-process network performance (latency and bandwidth) and available CPU percentage for each machine that it monitors. NWS has also been used under SDSC's Distributed Object Computational Testbed to distribute processing in an "embarrassingly parallel" gene sequence library comparison program and in the University of Wisconsin's Condor high-throughput computing project.

HERE, THERE, AND EVERYWARE

Wolski, who will become an assistant professor at the University of Tennessee in January 1999, is developing an entry for the SC98 Grand Challenge to demonstrate how AppLeS and NWS techniques can be combined to solve large-scale problems. The prototype, Everyware, is a set of distributed tools that enables applications to exploit different execution environments. For example, an Everyware program can combine resources supporting the Globus or Legion infrastructures with more disparate resources supporting only Unix and Unix networking. Similarly, batch-controlled large-scale parallel computers can be coupled with interactive workstations or workstation clusters. An Everyware program is scheduled by migratory agents that monitor progress and continually re-apportion resources to improve performance.

"The goal of Everyware is to make truly large-scale computing possible," Wolski said. "This is another step in the effort to construct operational metasystems for everyday use. Everyware and NWS are separate but complimentary pieces of the distributed computing puzzle." --MG