National Lab Facility Staff and DOE Computer Scientists Collaborate on Projects to Speed Up Experimental Data Analysis

hackathon participants enlarge

Participants in the inaugural hackathon are, left to right: First row: Julien Lhermitte, Barbara Frosik, Sameera Abeykoon, Jiao Lin, Wei Xu, Annie Heroux, Dean Hidas
Second row: Keith Beattie, Luis Barroso-Luque, Dariusz Jarosz, Jaoquin Correa, Hubertus Van Dam, Kevin Yager, Dantong Yu, Eric Dill, Daniel Allan, Yong-Nian Tang, Doga Gursoy, Nikolay Malitsky
Third row: Jean-Christophe Bilheux, Thomas Caswell, Raymond Osborn
Last row: Christopher O'Grady, Arman Arkilic, Li Li, Matt Cowan, Ken Lauer, Arthur Glowacki, Bo Jayatilaka, Pavol Juhas
Not on the picture: Gabriel Iltis, Daron Chabot, Shinjae Yoo, Kerstin Kleese Van Dam, John Hill, Michael Ernst

In early December, the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory hosted the first in a series of week-long “hackathons,” a code brainstorming session attended by nearly 40 computer scientists and software developers from several DOE Office of Science User Facilities, including those at Argonne, Berkeley, Oak Ridge and SLAC national laboratories. 

The hackathon is an effort launched at the initiative of the directors of the five DOE x-ray light sources—working together with the directors of the DOE neutron scattering sources—to help tackle common data challenges at their respective facilities.

“The idea was for the facility representatives to bring their data so everyone could collaborate on developing fast-analysis pipelines that can be used across the complex “ explained NSLS-II Director John Hill. “We believe that this is the most efficient approach and will give best value to our facility users.”

Staff from the National Synchrotron Light Source II (NSLS-II) and Brookhaven Lab’s Computational Science Initiative (CSI) organized the event.  About half the participants came from different Brookhaven Lab groups and facilities, including NSLS-II, the Center for Functional Nanomaterials (CFN), the Condensed Matter Physics and Materials Science Department and CSI.  The daily sessions were held at NSLS-II.  (NSLS-II and CFN are both DOE Office of Science User Facilities.)

“It’s all about data,” said Annie Heroux, group leader of Data Acquisition, Management, and Analysis for NSLS-II.  “All of the facilities are upgrading and have brighter beams and faster and more pixelated detectors. As a result, they are generating data faster than ever before. In order for our users to be successful, we have to give them the tools they need to view this data and analyze it in near real-time so that they can make informed decisions about their experiments. 

“This is a real challenge and will require a lot of work on all aspects of the problem—from moving the data off the detector onto disk, to initial visualization and analysis, to long-term storage and detailed analysis, “ Heroux explained. “This is the first time we have tried a hackathon and we were all curious to see how it would go. In fact, it went very well indeed, exceeding our expectations. It was really a very exciting week!”

Each participating lab brought along a project they wanted to work on, and cross-laboratory teams were formed to collaborate on modules of code that will be shared with the other facilities.

CFN’s Kevin Yager was in a group with Li Li (NSLS-II) and Dantong Yu and Wei Xu from Brookhaven Lab’s Computational Science Center. 

“We created a simple machine-learning data analysis pipeline,” Yager said. “We took a dataset recently collected at the CHX beamline at NSLS-II, and used
machine-learning methods to automatically cluster and categorize the data. This is part of a broader strategy, where we will be applying advanced machine-learning methods to handle the enormous datasets at NSLS-II. The hackathon provided a fantastic opportunity to collaboratively test out these ideas.”

“We completed our first experiment,” said Xu. “Personally, I learned many hands-on techniques, and updates of a few Python software packages from our project and from other people’s reported results. “

Yager added,  “As a whole, the groups worked on a wide variety of problems—everything from highly optimizing particular analysis routines, up through creating mock pipelines and testing distributed algorithms. But perhaps the greatest benefit of the hackathon was the ability to meet like-minded researchers from other facilities. A host of new collaborations were catalyzed.”

 Yager said there was clear consensus that future code should be shared and collected in a single software project, which will be called “scikit-beam.”

Work on scikit-beam has been proceeding well, according to Brookhaven Assistant Computational Scientist Thomas Caswell.  Two features implemented during the hackathon — fast accumulating histograms and streaming one-time XPCS — have been merged and will be available to NSLS-II users and staff in the upcoming run.

“The most valuable outcome from the hackathon is the relationships between developers across the DOE complex as these will be the basis for future collaborations,” Caswell said.  “Those collaborations are what will enable the DOE facilities to deliver the data analysis required to fully exploit the new hardware.”

Jean-Christophe Bilheux from Oak Ridge National Laboratory confessed to some initial skepticism about what could be accomplished, but was quickly won over.

“We had presentations on the first day about the tools we were going to use, and that’s when I started to realize that it could be a very useful week for our project,“ he said. 

Each Lab gave a presentation about their project and then began work with the help of experts on programming languages such as Python, and tools including TomoPy, an open-source imaging toolkit.

“Any problem or question was solved in a matter of minutes,” Bilheux said. “I started to use Python a couple of years ago and we are planning to provide a lot of tools to our users using it. I got in a week what could have taken me months of research. 

“We came with the hope of automating our 2D neutron imaging normalization,” he said. “We didn’t finish the project, but we learned about other tools, such as TomoPy, that already provide some of our needs. Thanks to the presence of its main developer, we were able to get it running on our system with our data. Then we set up a repository with all the right tools to get the documentation and the automatic testing up and running.”

TomoPy, an imaging toolkit developed at the Advanced Photon Source (APS) at Argonne, was used primarily within the synchrotron community before this event. This hackathon provided an opportunity to quickly apply Argonne’s previous effort for use at neutron sources. The lead developer of TomoPy, APS Assistant Computational Scientist Doga Gursoy, said, “The hackathon gave software developers and users the chance to work side-by-side. I was able to get much farther and faster with scientists from Oak Ridge and the Advanced Light Source (Berkeley) than if we had tried to do the same work through phone calls and e-mails.” 

With a substantial workload, the group put in eight-hour days of intense and continuous programming, but there was also a social component to the hackathon. 

“We didn’t only meet experts, we also made friends, always the secret to a good collaboration,” Bilheux said. 

“The hackathon provided a great opportunity for those new to this community to become familiar with what’s going on,” said Dariusz Jarosz, a recent hire at Argonne. “Now when I’m working on a project I know who is working on similar problems and where I can start to look to reuse code.”

Plans call for other facilities to host hackathons every few months, organized around different themes. The focus of the Brookhaven event was pipelines and workflow, and streaming data analysis. The next one on networking will be held at Lawrence Berkeley National Laboratory. Argonne’s APS is organizing the third hackathon to be held in the spring on multi-modal data analysis.

“This hackathon established the foundation of future collaborations,” said Yager. “I
expect future hackathons in this series to give us a chance to work closely on data-analysis challenges common to all facilities.”

Brookhaven National Laboratory is supported by the Office of Science of the U.S. Department of Energy.  The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.  For more information, please visit science.energy.gov.

2016-6110  |  INT/EXT  |  Newsroom