Ensuring quality of ATLAS data

7 September 2009


The assessment of data quality is an integral and essential part in any HEP experiment.  It is even more so for ATLAS, given the extreme detector complexity and the challenging experimental environment. Ultimately, a priori checks assure the scientific validity of the data.

The status of ATLAS data taking is evaluated based on information from the data acquisition and trigger systems (TDAQ), and the analysis of events reconstructed online and offline at the Tier-0, constituting the Data Quality Assessment or DQA. DQA comprises data quality monitoring (DQM), evaluation, and flagging for future use in physics analysis.


In DQMF, data quality is assessed by checking histogram content and comparing against reference histograms both online and offline. The results of the checks, along with information from the Detector Control System (voltage, temperatures, etc.), are used to determine the status of each detector. Event display applications, such as Atlantis and VP1, are used to aid data visualisation.



Both the online and offline DQM use a common software structure, the DQM Framework or DQMF. DQMF includes automated analysis of data via user-defined algorithms according to a particular configuration. The analysis mainly consists in checking if histograms are filled as expected, if the r.m.s is within the thresholds, etc. Histograms can also be compared to reference ones with algorithms such as χ2 tests and shape comparisons. Due to the different timing requirements, the online software is a distributed application while the offline version is a non-networked, standalone program.

The primary mandate of the online DQM is to provide fast feedback to the shift crew to ensure that they take good data. Raw detector data is accessed in the event flow and examined for hardware and configuration failures. To fulfil its task, the online DQMF interacts with the Online Services that are provided as part of the TDAQ software infrastructure. The Information Service is used to retrieve the data and publish the results, while the Online Histogramming service allows for the retrieval of the histograms. Of over 10 million histograms produced by more than 150 monitoring applications, DQMF automatically checks 50,000 histograms every minute. It also visualises the results in a graphical form, adopting a hardware view to facilitate their interpretation.

In the ATLAS computing model, offline DQM is part of full event reconstruction occurring for the first time at the Tier-0 centre. The refined monitoring includes sub-detectors, performance and physics oriented DQAs and can reveal problems in both data taking and the processing chain. Offline DQMF runs as soon as the data-taking conditions are available in the Tier-0 centre (typically within an hour of the beginning of the experimental run), on a significant subset of events collected with the express stream.  

Based on the online and offline DQM histograms and the automatic flags set within the DQMF, DQ shifters and detector system experts will usually make the preliminary decision about the data quality for a given run within 24 hours. Starting with the detector data quality information, Combined Performance groups will guarantee the validity of the data for future physics analyses (rejection of bad runs, categorisation of good runs for physics analysis). The physics analysis end-user will utilise data quality information in the form of Good-Run-Lists which are created from analysis-specific selections, based on data quality flags stored by detector systems and by the Combined Performance Groups.

In fall 2008, ATLAS Global Monitoring (GM) was deployed as a new feature of DQMF. Full offline reconstruction software is employed by the GM to fully reconstruct events online, in anticipation of the Tier-0 DQA. Consequently, the GM has the ability to compare trigger and fully reconstructed quantities and to facilitate the prompt study of correlations and synchronisations among all ATLAS sub-detectors. This new monitoring system reconstructs physics objects (jets, electrons, muons, MET) in a sub-set of events in which interesting physics processes occurred (for instance production of quarkonia, Z and W bosons). Currently, events are sampled at TDAQ sub-farm input nodes (output of the Level 2 trigger), but it is envisioned that Global Monitoring will run on events selected by the express stream. To be able to process all selected physics events, a farm of 70 CPU cores is utilised, providing a data monitoring processing rate of 7 Hertz. The GM framework also displays the data quality assessments from sub-detectors and physics/performance groups. As such, it contributes to collecting and summarising the current status of DQA, and has been adopted as the tool for data quality assessment by the ATLAS Data Quality shifters.

Global Monitoring was successfully deployed as part of both online and offline monitoring systems in the June-July Cosmic run of 2009. It proved to be a mature and robust framework ready for LHC collisions.


ATLAS Global Monitoring uses offline reconstruction software to process events online. Synchronisation and correlation among all ATLAS sub-detectors are just two examples of the new studies the GM allows. Here is an example plot showing correlations between Inner Detector and Muon Spectrometer tracks.



 

Anadi Canepa

TRIUMF