March 2006 Vol. 18/No. 2
By George S. Michaels
This is another in a series of CRN articles describing the activities of CRA’s industry laboratory members. Others are posted at:http://www.cra.org/reports/labs.
At Pacific Northwest National Laboratory, computational science is the foundation upon which this Department of Energy research and development laboratory depends to solve some of the greatest challenges our nation faces in national security, the environment and life sciences.
That’s a tall order. But that’s what we do at PNNL. And that work would be impossible without the Computational and Information Sciences Directorate (CISD). CISD provides the tools, and the computing and networking infrastructure, our scientists and engineers rely on to be successful. Whether the tools address climate modeling, handling huge data flows in biology and proteomics or modeling the impact of new energy systems, computation is an integral piece of delivering science-based solutions.
About CISD
CISD was formed in fall 2004 to centralize pockets of expertise in computational sciences that were scattered across the laboratory. With our computational power consolidated and aligned with specific areas of research, we are better able to use computation to advance the sciences and serve our clients.
CISD specializes in high-performance, data-intensive computing; bioinformatics and complex pattern recognition; intrinsically secure computing; information analytics; and knowledge foundations. These core research capabilities enable PNNL to provide the next generation of discovery and innovation to the Department of Energy (DOE), the U.S. Department of Homeland Security (DHS) and other clients in government, industry and academia.
Since 2004 we have recruited and hired more than 80 new researchers, bringing CISD’S total to 510 staff members. Many have experience in academia or industry, making them quick to understand the significance of projects and the need for cost-effective solutions. Together they comprise a formidable team, supporting all of the laboratory’s mission areas.
Among our new staff is the renowned mathematician, Benoit Mandelbrot. Dr. Mandelbrot is working with us to mature the strategy for our advanced mathematics program. Known as the father of fractal geometry, his unique ability to think freely and unconventionally lends itself to creating new methods for solving the kinds of computational conundrums that science is currently confronting. Among these challenges are managing, measuring and making sense of vast amounts of data generated by proteomics research, information analytics and cyber security. One aspect of Dr. Mandelbrot’s work at PNNL is establishing a more advanced curriculum in fractal mathematics in high schools.
The new capabilities available to our staff are a virtual research laboratory for evaluating key technology components and newly emerging systems for data-intensive computing. Additionally, developing capabilities include an information analytics laboratory for advancing technologies that enable powerful visual methods for acquiring, analyzing and presenting information.
Solving Information Overload—A New Approach
Advances in computing technology have enabled scientists to collect massive amounts of data over the past two decades. However, the ability to extract valuable knowledge from multiple types of data obtained from multiple sources and scales in real time continues to be a major challenge.
Many government agencies, including DOE, the National Institutes of Health, DHS, the Department of Defense and the intelligence community, need computational capabilities well beyond the current state of the art to solve problems involving large, complex data sets.
Bringing large data sets together for analysis requires a different computing approach; it requires tools for transforming data into information that we can use. This type of approach—data-intensive computing—is one of CISD’s specialties. Our researchers work on every step of the development pathway from real-time data gathering, analysis and management to developing analytics approaches that provide added value to the data.
Our expertise in designing tools that make huge data sets meaningful has led to significant contributions to the data-intensive sciences, including predictive biology and energy sciences, nanoscience, and energy conversions. For example, The Morning Report is an advanced, proactive aviation safety and systems monitoring tool that can be extended to other domain applications to monitor massive amounts of data. The results enable domain experts to monitor complex systems by identifying typical patterns and atypical events. This tool received the 2005 R&D 100 Award and R&D Magazine’s Editor’s Choice Award.
In another example, PNNL scientists are using computer modeling of proteins found in bacteria membranes to discover new methods of fighting infections. They currently are developing a computer model of the cell wall of an aggressive bacterium, Pseudomonas aeruginosa, which infects the respiratory systems of cystic fibrosis sufferers. By modeling the cell wall, scientists hope to discover how the membranes and proteins enable the bacterium to elude treatment by traditional antibiotics, potentially leading to new treatment strategies.
Knowledge Centers—Informational Tools for Visual Analysis
Another way to address information overload is visual analysis. CISD has created three types of knowledge centers—science-based, technology-based and mission-based—to tackle the daunting tasks of collecting, managing, visualizing and analyzing massive data accumulation using unique software products.
For example, the National Visualization and Analytics Center (NVAC™) is a science-based knowledge center at PNNL, established by DHS to develop the next generation of tools and scientists for creating visual methods of analyzing and conveying complex information. NVAC has been tasked with establishing the nation’s research agenda in this area and taking visual technologies to new levels.
Through NVAC, we are organizing a consortium of stakeholders—made up of multiple government agencies, academia and industry—to ensure relevant research, integration and interoperability resulting in deployable systems for defending our nation. We recently held the first consortium meeting at PNNL, which was attended by leaders of more than 40 of the nation’s leading computing and analytical companies. As a result, industry leaders formed the first Industrial Visual Analytics Center (IVAC), which is now in the early stages of development.
In addition, we are developing partnerships with universities to advance the science, including establishing regional visualization and analytics centers (RVACs) to bring academic expertise to the task of discovering information that may forewarn officials of a terrorist attack. RVACs include University of North Carolina at Charlotte and Georgia Institute of Technology; Purdue University and Indiana University School of Medicine; Pennsylvania State University; and University of Washington and Stanford University.
Beyond developing innovative technologies, NVAC will stimulate the talent required for both invention and operation of the field's new suite of tools. This means a steady flow of staff exchanges, building new curriculums, and hosting interdisciplinary workshops and conferences among academia, industry and other laboratories.
Changing the Game
CISD aims to deliver the highest-end computing capability for the nation. To achieve this goal, we are partnering with industry and academia to drive the development of new computing paradigms in both supercomputing architectures and scalable software.
In supercomputing, we are developing a fundamentally new approach to discovery through sciences using high-performance computing based on informatics rather than only physics. This approach addresses the need to produce, collect, store, explore, analyze and quickly share huge amounts of scientific information. Much of the effort is centered on creating algorithms, software, operating systems and new computational and storage systems to solve a broad set of problems involving large, complex heterogeneous data.
On the software side, we are creating new scalable data-analysis tools and new tools for discovering patterns in large heterogeneous databases and for integrating data across different space and time scales. For example, we demonstrated the feasibility of high throughput access to remote file systems in a computational chemistry simulation that received the StorCloud Award at the 2005 Supercomputing Conference.
PNNL also finished second in the Bandwidth Challenge, transferring 41 Gigabits per second during the challenge test. At the low end, this equals transferring and processing a full DVD of video every second. Both demonstrations operated over the recently established PNNL Regional Optical Network and UltraScience Network to move data between a Hewlett-Packard parallel file system located at the conference in Seattle and an Itanium compute cluster located at the laboratory.
The PNNL team is taking on global and national challenges. We invite others interested in making immediate as well as long-term impacts to join us in conducting research and developing technology in the computational and information sciences that will drive changes in computing over the next decade.
For more information about CISD see the CISD Web site at: http://computing.pnl.gov/.
Dr. George S. Michaels (george.michaels@pnl.gov) is Associate Laboratory Director of Pacific Northwest National Laboratory’s (PNNL) Computational and Information Sciences Directorate. A pioneer in bioinformatics, Dr. Michaels founded one of the nation’s first doctoral programs in computational sciences while teaching at George Mason University in Fairfax, Virginia. Much of his career has focused on computational analysis and applying statistical models to life science research.
1828 L STREET, NW SUITE 800, WASHINGTON, DC 20036 | P: 202-234-2111 | F: 202-667-1066