LHC Data Abundance Puts Brookhaven Computing to the Test
By Yuhas | November 5, 2010
Speaking above the whir of row upon row of central processing units (CPU), magnetic disks, tapes, and air conditioning units that store, distribute, and cool quadrillions of data bytes in the RHIC and ATLAS Computing Facility (RACF), director Michael Ernst jokes:
“I’m the one trying to hold things together!”
After months of preparation, the ATLAS experiment — based at CERN’s Large Hadron Collider (LHC) in Geneva, Switzerland — has started collecting data, putting computing to the test.
In March, the LHC began its first physics run at the unprecedented energy level of 7 trillion electron volts. The world’s most powerful accelerator creates millions of collisions per second, meaning a tremendous amount of data is collected by the detector-based experiments and managed around the globe.
From the moment collection began, the data has been growing exponentially. By August — after only six months of operation — the data had climbed to two petabytes of raw data and an equivalent amount of analyzed event summaries. That’s enough to fill more than 8,000 MacBook laptops (at 500 gigabytes each) with information.
Michael Ernst, in addition to overseeing the centralized computing needs of RHIC, maintains the primary U.S. artery for the widespread grid network that handles data from the ATLAS experiment.
Ernst explained that the rate of accumulation came as a surprise. Based on studies performed a year ago, he and his colleagues expected a rate of 2 gigabytes per second to come in across all 10 major ATLAS computing centers worldwide.
The actual rate in March, however, was twice what was expected, and by May, the data streaming in had climbed to four times the expected rate. Although higher than anticipated, with peaks of nearly 10 gigabytes of data per second, this marked an exciting success for ATLAS computing facilities.
“What we found was that our facilities could accommodate a data rate well beyond the initial target,” Ernst said.
At BNL specifically, the RACF’s data rate of more than 70 gigabits per second is fast enough that one could transfer all of the publicly available digitized data in the Library of Congress over the course of about two hours and 15 minutes. Try doing that at home and it would take more than a year to transfer all of the data with an ordinary cable modem connection.
RACF receives 22 percent of the ATLAS data sent out from CERN. This means that CERN sends data from a little more than one in every five collision events selected by the detector to BNL.
This data is kept on a combination of magnetic disks, and magnetic tape. These systems are more efficient and secure than many conventional methods of data storage used with personal computers.
For example, RACF makes 7.5 petabytes of data storage available to the ATLAS experiment with magnetic disks. If these were stored on common 700 megabyte CDs, the stack would be more than 10,000 meters high.
“That’s more than 10,000,000 CDs,” Ernst said. “I would not want to manage them!”
So how does data get from the ATLAS detector to a physicist’s desktop, and what happens in between?
This story begins in the detector, where a trigger and data acquisition system filter out a select set of interesting particle collisions — so that out of a billion events occurring each second, only the 200 most interesting are saved. CERN’s computing facility then processes the data acquired from the detector. This is referred to as Tier-0, the starting point for data to be disseminated across the Grid.
CERN sends ATLAS’ raw data out to ten Tier-1 facilities. In the United States, Brookhaven is the Tier-1 computing facility for ATLAS data, with bandwidth dedicated exclusively for data transfer to and from CERN, as well as other Tier-1 facilities around the globe. Here, under the watchful eye of Ernst and his colleagues in the computing facility, it is managed for two purposes: archiving and analysis.
Brookhaven serves as a primary source for ATLAS data, archiving the U.S.’s share of the raw data sent by CERN. This is valuable both as a secure storage space and for physicists who want to work with data as close to the raw data set as possible.
ATLAS computing at Brookhaven also prepares data for various physics analyses. Ernst and his colleagues reconstruct raw data to provide collision event summaries and analysis objects. These computing activities analyze and filter the data to provide physicists with the material most applicable to their research.
Brookhaven’s Tier-1 computing facility then sends the data out to Tier-2 computing facilities in the U.S., which include eight universities and SLAC National Accelerator Laboratory.
Through the network of ATLAS computing facilities, physicists can access data and request the execution of various analysis tasks.
“In order for this to work, we need good integration and good communication,” Ernst said. “This system is really making worldwide computing transparent. That’s key to global computing and collaboration.”
2010-2079 | Media & Communications Office