From BNL to KISTI: Establishing High Performance Data Transfer From the US to Asia
By Dantong Yu, Jerome Lauret and In-Kwon Yoo
Modern high energy and nuclear physics experiments yield huge amounts of data and thus require efficient and high capacity storage and transfer. As one such massive data-generating experiment, the RHIC/STAR experiment has the capability to acquire as many as 1000 collision events per second during run-time. The resulting flood of data requires transferring physics strategic samples around the world. BNL, as the center of operations for RHIC, plays a pivotal role in transferring data to and from other sites in the US and around the world in a tiered fashion for further distribution and analysis.
The Korea Institute of Science and Technology Information (KISTI) is the first computing facility to have joined the STAR collaboration as a full institutional member. Continuing a tradition of efforts to redistribute data to Asian collaborators, where STAR hosts 27% of its institutional workforce, the collaboration has targeted for the coming runs to transfer data in real-time and in a sustained manner from the Counting House to KISTI. With data processing capabilities (up to 20% of the data could be processed there) KISTI could then be leveraged for fast calibration and analysis, as well as being a data redistribution hub for STAR’s Asian collaborators.
To reach this objective BNL’s RACF and ITD networking, STAR and KISTI groups teamed with DOE’s ESnet (Energy Sciences Network), KREONet2 (Korea Research Environment Open Network2), and GLORIAD (Global Ring Network for Advanced Applications) specialists. The data transfer challenge conducted between December 18-22, 2008 established an unprecedented sustained data rate of 600 Mbits/second from the US to the Asian mainland and the fastest data transfer rate ever achieved using the GLORIAD environment between BNL in the US and KISTI in Korea.
Figure 1 shows the entire data transfer chain. The five teams worked around the clock to tune the network and storage servers and obtained a surprisingly good result of a stable 600 Mbits/seconds during the entire data transfer period, as shown in Figure 2 and 3 with two minor temporary interruptions. The first interruption was caused by a BNL-side power outage. The second interruption was due to an initial pilot error which caused the data flow to use the wrong data store and hence, the storage system unexpectedly filled up. Both problems were quickly diagnosed and resolved, thanks to the care and attention of the personnel in place for the exercise.
Concentrating on the redistribution of data through a “Data
Grid” strategy could provide analysis viability, sustainability,
reliability and increased local scientific opportunity by
allowing remote scientists to harness analysis capabilities
local to their home institution as well as getting more
resources (both hardware and human potentials) and getting
deeply involved in the life-cycle of an experiment. Such data
delivering exercise is well in line with supporting globally
distributed computing paradigms that exploit modern data
storage, computing, and networking infrastructures.