Composite picture of two images captured by the Data Capacitor of a crystalline specimen being observed by X-ray diffraction techniques at the IU Molecular Structure Center.
Modern computational resources provide research scientists with access to unprecedented amounts of data at rates previously thought impossible — data that may help to predict the next big hurricane, or discover a life-saving new drug. But with this incredible opportunity for discovery comes the very real challenge of how to manage and share these massive data sets.
In answer to this challenge, a team from University Information Technology Services, IU School of Informatics, and Pervasive Technology Labs has developed the Data Capacitor — a 535 Terabyte Wide Area Lustre Filesystem designed to help scientists temporarily store and manipulate very large data sets. The system is designed to be easy to use, allowing scientists to share data quickly even across great distances.
The project is supported by a $1.7 million grant from the National Science Foundation under the direction of principal investigator Craig A. Stewart and Data Capacitor project team leader Stephen Simms. Project co-PIs include Randall Bramley, Catherine Pilachowski and Beth Plale.
The Data Capacitor created a buzz in the research community early this summer when it demonstrated the exceptionally fast single client transfer rate of 977 Megabytes per second across the TeraGrid network.
"Imagine being able to move 12 DVDs worth of data from your desktop machine onto a filesystem two states away in a single minute," said Steven Simms. "This technology has the potential to significantly change how scientists collaborate across distances."
The Data Capacitor team along with partners from the Technische Universitaet Dresden announced recently that they had demonstrated impressive performance on a distributed transatlantic Lustre filesystem, opening the door to greater collaboration between scientists in the U.S. and Europe.
The Data Capacitor has supported several high-profile IU research projects including the Linked Environments for Atmospheric Discovery (LEAD) Project, a weather and storm prediction portal, and the Common Instrument Middleware Architecture (CIMA) project that allows scientists to manipulate instruments at remote labs and manage collected data without leaving their home labs.
In conjunction with the CIMA project, Scientists at IU's Molecular Structure Center use the Data Capacitor to support x-ray crystallography research. A stream of data comes from the lab along with additional metadata about each sample — temperature, humidity, pictures of the crystal sample, etc. — and is written to the Data Capacitor. Through the CIMA project, the Data Capacitor helps to manage similar data for a global consortium of more than a dozen crystallography labs at universities and national facilities with sites in the US, UK and Australia.
"The Data Capacitor has been exceptionally valuable to the CIMA project," said principal investigator, Donald F. McMullen. "Its capacity and throughput allowed us to design and implement a system that supports data sharing and maintains workflows involving massive amounts of instrument data."

The Data Capacitor's 52 Servers are spread across six water-cooled racks housed at the Wrubel Computing Center on the IUB Campus.
More info on the Data Capacitor: http://datacapacitor.researchtechnologies.uits.iu.edu/
More info on the LEAD Project:
https://portal.leadproject.org/gridsphere/gridsphere
More info on the CIMA Project:
http://www.instrument-middleware.org/metadot/index.pl
