By Shigeki Misawa and Ofer Rind
Shigeki Misawa (left) and Ofer Rind at the RHIC & ATLAS Computing Facility (RACF) at Brookhaven Lab
Run 13 at the Relativistic Heavy Ion Collider (RHIC) began one month ago today, and the first particles collided in the STAR and PHENIX detectors nearly two weeks ago. As of late this past Saturday evening, preparations are complete and polarized protons are colliding with the machine and detectors operating in "physics mode," which means gigabytes of data are pouring into the RHIC & ATLAS Computing Facility (RACF) every few seconds.
Today, we store data and provide the computing power for about 2,500 RHIC scientists here at Brookhaven Lab and institutions around the world. Approximately 30 people work at the RACF, which is located about one mile south of RHIC and connected to both the Physics and Information Technology Division buildings on site. There are four main parts to the RACF: computers that crunch the data, online storage containing data ready for further analysis, tape storage containing archived data from collisions past, and the network glue that holds it all together. Computing resources at the RACF are split about equally between the RHIC collaborations and the ATLAS experiment running at the Large Hadron Collider in Europe.
For RHIC, the data comes from heavy ions or polarized protons that smash into each other inside PHENIX and STAR. These detectors catch the subatomic particles that emerge from the collisions to capture information—particle species, trajectories, momenta, etc.—in the form of electrical signals. Most signals aren't relevant to what physicists are looking for, so only the signals that trip predetermined triggers are recorded. For example, with the main focus for Run 13 being the proton's "missing" spin, physicists are particularly interested in finding decay electrons from particles called W bosons, because these can be used as probes to quantify spin contributions from a proton's antiquarks and different "flavors" of quarks.
Computers in the "counting houses" at STAR and PHENIX package the raw data collected from selected electrical signals and send it all to the RACF via dedicated fiber-optic cables. The RACF then archives the data and makes it available to experimenters running analysis jobs on any of our 20,000 computing cores.
Polarized protons are far smaller than heavy ions, so they produce considerably less data when they collide, but even still, when we talk about data at the RACF, we're talking about a lot of data. During Run 12 last year, we began using a new tape library to increase storage capacity by 25 percent for a total of 40 petabytes—the equivalent of 655,360 of the largest iPhones available today. We also more than doubled our ability to archive data for STAR last year (in order to meet the needs of a data acquisition upgrade) so we can now sustain 700 megabytes of incoming data every second for both PHENIX and STAR. Part of this is due to new fiber-optic cables connecting the counting houses to the RACF, which provide both increased data rates and redundancy.
With all this in place, along with those 20,000 processing cores (most computers today have two or four cores), certain operations that used to require six months of computer time now can be completed often in less than one week.
If pending budgets allow for the full 15-week run planned, we expect to collect approximately four petabytes of data from this run alone. During the run, we meet formally with liaisons from the PHENIX and STAR collaborations each week to discuss the amount of data expected in the coming weeks and to assess their operational needs. Beyond these meetings, we are in continual communication with our users, as we monitor and improve system functionality, troubleshoot, and provide first-line user support.
We'll also continue to work with experimenters to evaluate computing trends, plan for future upgrades, and test the latest equipment—all in an effort to minimize bottlenecks that slow the data from getting to users and to get the most bang for the buck.
— Shigeki Misawa
Group Leader, RACF Mass Storage and General Services
— Ofer Rind
Technology Architect, RACF Storage Management