Skip to content

GRID COMPUTING | Contents | Next

The Big Picture: Viewing Terascale Data Sets Interactively from the Desktop

PROJECT LEADERS
Reagan Moore, Bernard Pailthorpe
SDSC
PARTICIPANTS
Scott Baden
UC San Diego
Joel Saltz, Alan Sussman, Tahsin Kurc
University of Maryland
Jon Genetti, Dave Nadeau, Mike Wan, Arcot Rajasekar
SDSC

The old maxim states, "a picture's worth a thousand words." Today the Department of Energy uses powerful supercomputers to run increasingly realistic simulations of weapons tests that can no longer be done in the real world. But these simulations generate terabytes of data--thousands of billions of words. So researchers need a data visualization corridor, a path to move the data to their desktop workstations where they can display, interact with, and make sense of this abundant information. NPACI researchers are developing a data-handling system to make real-time visualization and interaction with these massive data sets a reality.

In large simulations, researchers must compute grids with one billion simulation cells, producing multi-terabyte data sets that can overwhelm existing storage, management, access, networking, and visualization technologies. Such is the case for the Department of Energy's (DOE) Visual Interactive Environment for Weapons Simulation (VIEWS) program, part of the Accelerated Strategic Computing Initiative (ASCI) to move weapons testing into the world of computer simulations.

BUILDING A DATA VISUALIZATION CORRIDOR

NAVIGATING THROUGH LARGE DATA SETS

BUILDING A DATA VISUALIZATION CORRIDOR

To best understand immense data sets, researchers need not only to visualize the data, they need to be able to see it in multiple views that they can interactively control in real time from their desktop workstations. To provide scientists with this capability, SDSC and NPACI researchers are developing and adapting a number of tools for handling and visualizing data. This Terascale Visualization project is being carried out in collaborations with Lawrence Livermore National Laboratory, Los Alamos National Laboratory, and Sandia National Laboratories.

For researchers to interact with simulations run on remote high-performance computing resources requires a data-visualization corridor to connect the archives that hold the large, complex data sets to the user's visualization display. "Advanced displays today present about 8 MB of data on each screen at 24 frames per second--about 200 MB per second," says Reagan Moore who leads SDSC's Data-Intensive Computing Environments group. "At this rate, it would take about an hour and a half to access one terabyte, far too slow to be interactive. So we're applying data-handling systems to help the researcher interactively identify and access smaller subsets of the key data that they need to see in order to understand what's going on."

Top| Contents | Next

NAVIGATING THROUGH LARGE DATA SETS

A typical 3-D hydrodynamic simulation, for example, contains regions of complex eddies. Especially near the boundary, a researcher may notice that the flow has developed characteristics that may not be physically possible, and this region needs to be viewed in greater detail to see whether the simulation is still valid there.

What tools can researchers use for finding their way through the vast sets of 3-D data produced by such large simulations? The approach NPACI researchers are taking is to apply tools for navigating to the region of interest through low-resolution representations of the data set while simultaneously providing high-resolution displays of the smaller desired focal point.

A key aspect of this project is linking the data-handling and visualization environments. Part of the solution is the Kernel Lattice Parallelism (KeLP) software, developed by Scott Baden of UC San Diego. KeLP is used to define a "floor plan" that specifies which region--in this case, the boundary of the hydrodynamic simulation--the researcher needs to access and visualize at high resolution, and the larger regions that only need to be accessed at low-resolution.

The SDSC Storage Resource Broker--client-server middleware that provides a uniform interface and advanced meta-data cataloging to give users rapid access to heterogeneous, distributed storage resources--is then used to control execution of the University of Maryland's integrated DataCutter middleware to extract the desired subsets of data at the location where the data are stored.

The DataCutter finds a series of data subsets that are then rendered in 3-D by SDSC's Volume Imaging Scalable Toolkit Architecture (VISTA) software and provide a slow-motion animation of the overall hydrodynamic simulation at low-resolution. The researcher then interactively defines smaller regions of interest and, guided by the low-resolution image, directs the DataCutter to return a subset of high-resolution data of just this region--the swirling flow of the eddies near the simulation boundary.

VISTA supports perspective views, simultaneous rendering of multiple independent, possibly overlapping data sets, and compositing of images for display. These images are transmitted back to the user's location and displayed on the desktop workstation. SDSC researchers are also working to adapt KeLP and DataCutter software to work with unstructured data.

As these innovative tools are further developed and integrated, researchers will have increased ability to identify and access the growing data sets generated by DOE VIEWS simulations and other important research programs--opening increasingly powerful windows into terabyte data sets. --PT

Top| Contents | Next
Top| Contents | Next