Tier-0 on task

1 Decem ber 2008

Recent Tier-0 performance

As you read this sentence, cosmic events in ATLAS are fed through the trigger and data acquisition system to the Tier-0.  They may seem like a trickle compared with the flash of the collisions that ATLAS is designed to capture, but the almost 500 million events recorded in the last four months have generated 1.1 petabytes (1 million gigabytes) of raw data – not much less than  ATLAS is supposed to record in that amount of time during a run with beam.

“We are more or less up to the nominal rates,” Armin Nairz, ATLAS Tier-0 Operations Coordinator, confirms. The cosmic rays provide complete practice for the system.  Data from the detectors goes through the trigger system and gets assembled online.  It is written to files and recorded to CASTOR, CERN’s mass storage system, and there is a “handshake” database in place to notify the Tier-0 about the new data.

Between the arrival of the cosmic data and the point when processing can begin: “There is a delay of usually not more than an hour,” says Luc Goossens, ATLAS Tier-0 Software Development Coordinator. Once there are collisions, only part of the data – the so-called “express” and “calibration streams” – will be processed that promptly. The bulk “physics streams” will have to wait until suitable calibration and alignment constants have been calculated, a process that is expected to take about one day.

 “We pick up the data and start processing them on our batch farm,” says Armin.  The farm currently contains 1,500 processing cores, but it is just a subset of the common farm available to CERN users, which contains around 10,000 cores in about 1,600 machines.

After picking up the raw data from CASTOR, the Tier-0 runs the first-pass event reconstruction , producing, among many other data products, event summary data (ESD), combined n-tuples (CBNT), and analysis object data (AOD). The software also produces histograms for the offline data quality monitoring team and publishes the information to the web.

On September 10th, the first beam splash events rushed through the system, and the Tier-0 team was able to post results after a mere two hours.  “We have been preparing for taking data for a couple of years already,” says Armin. The ATLAS Tier-0 team performed standalone through-put exercises to make sure that the system could handle the intended amount of data. 

The Tier-0 hardware resources have been set up to cope with a raw data rate of about 300 Megabytes per second, at an event rate of 200 per second. Those are the nominal rates agreed between online and offline communities in the so-called “computing model”. There are contingencies, but the available Tier-0 bandwidth also has to be shared with many other activities besides data taking, like the reading and writing of processed products, tape archiving and, most notably, the data export to the Tier-1 centers.   

Apart from the bandwidth: “The bottleneck is really the reconstruction time,” Luc adds. “The time it takes to reconstruct an event (about ten seconds) is limiting the rate, as we have only a limited amount of CPU resources.”

The cosmic events we have been recording can be up to ten times the expected size of collision events, and the average reconstruction time is about twice the time foreseen for reconstruction of an average collision event. This leads to the counter-intuitive effect that often the maximal event rate the Tier-0 can handle is lower for the “simple” cosmic events than for real collisions. 

As we approach December, most of the subsystems will stop taking data as hardware work begins. However, the Inner Detector had limited time to calibrate and align prior to September 10th, so it will continue to run as long as possible.  In any case, the Tier-0 will keep running and support all detector commissioning activities as needed.

On the software end of Tier-0, Luc describes the current activities as the “icing on the cake”. They’re working on automating the system as much as possible. As he envisions it: “It will be like a big factory which normally works by itself…and then there are people on shifts looking at a big screen, hopefully with a lot of green stuff on it.” However, if a “red alert” appears, the shifter will have to take the necessary actions to get it resolved.

This graphical interface is one of the last missing pieces of the ATLAS Tier-0 software puzzle, and the team already has prototypes.  “We will have it finished by the time the first beams come back,” Luc says.


Katie McAlpine

ATLAS e-News