Data in parallel

6 April 2009

Tier-0 computing



One of the potential pitfalls of building an enormously complex experiment which is its own prototype is that occasionally, despite meticulous planning, something slips under the radar. So runs the story of ATLAS’s “parallel partition” data taking capabilities – the recently developed feature which now allows several data-taking sessions to be defined and run simultaneously, side-by-side.

“When developing the system at the beginning, people were thinking to the single run only,” explains Trigger and Data Acquisition (TDAQ) engineer Wainer Vandelli. “Then you discover that you need to run multiple partitions, and you think, ‘Yeah it’s clear, I knew already, but somehow I didn’t think about it,’” he smiles, with the benefit of hindsight.

The requirement to run several parallel partitions – which define different data-taking sessions – arose at the end of at the end of spring 2008, just before ATLAS officially began continuous cosmic data-taking. The sub-detectors were at the calibration stage, and each had its own priorities for how they wanted to use the runs.

For example, the calorimeters may have wanted to run with the L1 Calo trigger to calibrate and tune their electronics, while the Inner Detector may have been more interested in collecting as many muon tracks as possible in order to study detector alignment. Parallelising these conflicting requirements, rather than waiting for one run to end before the other could begin, was a matter of efficiency.

Up until that stage, it was possible to run partitions in parallel at Point1, but only up to a certain point. “The problem was that, depending on the configuration, one ended up either having multiple, parallel files per run [which had to be joined together offline] or having to run a dedicated online gathering application on detector-provided nodes,” explains Wainer. Things got even more complicated if you tried to add another detector to the run.

The solution was to add in an “extra layer” of TDAQ infrastructure, so that all the information could be converged into a single file at the event builder level. This has made it possible to easily run with other detectors to check cross-detector information, and it can all be done online.

“The configuration for these partitions is really modular, from the TDAQ point of view,” explains Wainer. “We can switch on and off single computers, across 2000 of them.” The detectors can be included or excluded in varying degrees of modularity, from the sub-detector level right down to individual chambers, with each ‘module’ feeding into a maximum of one partition at any given time.

This feature will come into its own in full physics running mode by allowing testing to be done on singled-out problem areas while the rest of the combined run continues unaffected. Not only that but, in between beam runs of ten hours or so, while the ring is being re-filled with protons, each detector will be able to take individual calibration data within the strictly limited time available.

“Each detector will want to run in dedicated trigger and detector configuration, because the calibration is really detector specific,” explains Wainer. “So we want several runs. But all in parallel, and all at the same time – because in a few hours, the beam is coming back!”

As is often the case with back-designing, there was one major snag which threatened the successful running of the new system. The data flow manager (DFM) application, which orchestrates the data collection and deletion processes in order to make room for constantly-arriving fresh data, works on a ‘broadcast’ system; when the time comes to send a “clear data” message, it sends it as one single message to each of the 150 computers in the read-out system. When several data-taking partitions are running in parallel, the risk is that the DFM from one of them will delete data which is still required on the other data paths.

“This was a major single point of failure,” remembers Wainer. The issue was tackled with a re-configuration at the software level: “Now you can define the mailing list that you attach to each partition,” he says. This ‘mailing list’ identifies which computers in the read-out system should receive the ‘clear’ messages broadcast for a given partition. The changes were introduced to the latest software release, which came out in December 2008.

 

 

 

Ceri Perkins

ATLAS e-News