BNL Scientific Data and Computing Center Projects
A High Performance Network Architecture to Support Scientific Research at BNL
In the era of “Big Data”, unprecedented amounts of data will be generated by an ever increasing number of scientific experiments at potentially high data rates. Even within a data center, management of large data volumes and high data rates is extremely difficult. Much of this data is expected to be generated at locations within the BNL campus that are far from the main BNL data center, making data management substantially more difficult. Most of these locations lack the necessary physical infrastructure (space/power/cooling) to support the required “off line” collection, storage, and analysis of this data. Many also cannot even support the equipment needed for “on line” data management. At many of these locations multiple, transient scientific experiments are hosted over the course of a year. Almost all of these locations are not adequately prepared to host access to data and compute resources for experimenters that have completed collecting data and are no longer physically present at the experiment location. In anticipation of these looming problems, the Network Group at the Information Technology Division (ITD) and the RHIC/Atlas Computing Facility (RACF) have architected and deployed a new, high performance network fabric that is dedicated to connecting disparate locations to the BNL data center. The capabilities of this network fabric make it possible to alleviate many of the problems that are expected in the era of Big Data.
The HPC Core network fabric is a new, high performance network fabric that is completely separate from the standard BNL campus network. The HPC Core network is designed to connect experimental equipment (on line processing equipment) to the network fabric that extends to the BNL data center. With typical campus network fabrics, the direct connection of experimental equipment is either prohibited or highly restricted as putting these systems on traditional networks makes them accessible to any other system that is connected to the network. To alleviate these problems, equipment associated with scientific instruments is typically protected by firewalls or connected to a partitioned local network that is only accessible from “bastion” hosts. In the era of Big Data, these isolation mechanisms will seriously impact scientific research. The HPC Core network eliminates the need for these isolation mechanisms by making it possible to enable selective connectivity, via the Internet standard Border Gateway Protocol (BGP), to disparate networks at full network line rate. By using BGP, the operational complexities associated with providing selective access via router access control lists (ACL's) is eliminated. With this capability, network attached equipment at an experiment site can securely connect to resources at a location that is physically remote from the experiment site. The level of security is also configurable from the most secure (private data center extension) to moderate security (access to centrally provided compute and storage resources). With appropriate design and deployment of centrally provided compute and storage resources, the security of the experiment site can be almost as good as the legacy “bastion host” connectivity, while providing immediate access to experiment data “everywhere” without the need to copy/move data.
At this point in time, the HPC Core network is in full production for the STAR and PHENIX experiments at the Relativistic Heavy Ion Collider (RHIC), the CFN Electron Microscopy Group, the Collider-Accelerator Division, and the RACF with a variety of computational and storage services. The network will also be used by the BNL Institutional HPC Cluster which is expected to be brought on line in 2016.