Good run lists, what about them?

22 February 2010

With the availability of collision data, it is essential for everyone to understand what datasets are good for physics analysis. To define a good dataset we need Data Quality (DQ) information, as assessed by the DQ group. The approach to using DQ information in a physics analysis is through the use of dedicated lists of runs and luminosity blocks, known as “good run lists” (GRLs).

A luminosity block is the unit of time for data-taking, and lasts about two minutes. A good run list is formed by applying DQ criteria, and possibly other criteria, to the list of all valid physics runs and luminosity blocks.

Before we discuss the creation and application of GRLs in ATLAS, you need to know about Data Quality status flags, or DQ flags in short. DQ flags form the building bricks of good runs lists; they ensure that the DQ assessment of experts is applied consistently in any physics analysis.

DQ flags are simple indicators of data quality, and act much like a traffic light. They are issued by each sub-detector and the combined performance groups, are set per luminosity block, and are valid for all data streams. There are more than 100 DQ flags stored in the COOL conditions database. Their interpretation can be found here:

Each sub-system is responsible for filling in their DQ flags. The most low-level flag, filled automatically, is based on detector control conditions (DCS), such as nominal voltages, temperature, humidity, etc. Detector sub-systems fill in flags at a number of different stages, flagging possible hardware and data-taking problems. Automatic “online” flags are set during data taking; these can be overwritten by the detector shifter at Point1 during data taking.

These flags are then reviewed by an offline expert who can update the offline detector flag, which takes precedence over the online tags. For the combined performance (CP) groups the flags are set by a combination of automatic consistency checks and also by DQ shifters. A distilled summary of these stages, called LBSUMM, is stored in the conditions database as well, and is used for the creation of GRLs.

DQ flags are frozen periodically to form a COOL database tag. A tag is issued after each new reconstruction processing of the data, or when defining a dataset epoch. These frozen flags can only take a red or green status. A red status indicates the data taken by the relevant sub-system has been declared bad, so recorded data should be excluded from physics analysis.

To form a GRL, a query of DQ flags is required to be green, i.e. indicating good data. The DQ assessment performed after the December 2009 data reprocessing is stored in COOL under tag DetStatusLBSUMM-December09-01.

Most of the available DQ flags are set by detector sub-systems. Some combined performance groups already fill flags as well, for example to qualify jet, missing energy, and tau reconstruction.

Two important flags you should be aware of are the “ATLAS Global” and “L1CTP” flags, which need to be green for any luminosity block in a physics data sample. These flags indicate that DQ information has been reviewed, and the L1 trigger is working without problems.

The DQ status flags will be used in the very near future by the combined performance (CP) and trigger groups to establish the requirements that declare the data good, flawed, bad for given physics objects, e.g. electrons, photons, muons, taus, jets, MET, b-tagging, but also trigger slices, luminosity determination, etc.

As a result, GRLs for individual physics objects and trigger slices can be defined. Finally, the physics groups determine which final states are relevant for their analyses, and derive the corresponding physics DQ status using the DQ status flags assigned by the CP and trigger groups.

The physics DQ status, coupled with other run requirements, such as run range, minimum number of events per run, beam energy, the availability of certain trigger chains, determines a GRL for physics analysis. Also, work is currently ongoing to convert GRLs directly into corresponding ATLAS datasets, so good run lists and datasets should soon become synonymous.

Now, what should you use in practice? Officially recommended GRLs, to be used for physics analysis and publication of physics results, are (already!) created and distributed by the DQ group. They can be found on the Good Run List Generator page, and are simple text-based XML files containing run numbers and luminosity block ranges. The lists available currently are for (minimum bias) tracking-, tau-, muon combined performance- , jet- and missing energy reconstruction studies on the November and December 2009 collision data. I suggest you take a look yourself to see what is there!

Technically, GRLs are compiled using the ATLAS run-query tool (AtlRunQuery). AtlRunQuery is a powerful tool to search for runs of interest, based on a user-provided search query.

You are invited to try out the ATLAS run-query web page! (This is also a great place to see the DQ flags currently filled and available.) Official GRL search queries are to be provided to the DQ group by CP- and physics groups.

They are stored in configuration files, such that GRLs are easily reproducible and maintainable. Good run lists are to be updated when a new DQ assessment is tagged in COOL, such as DetStatusLBSUMM-December09-01, tagged in January.

Several tools exist to use GRLs to select events and luminosity blocks. They are available for Athena, ROOT-, and python-based physics analyses, in the ATLAS software packages GoodRunsLists and GoodRunsListsUser. The tools comprise event and luminosity block selection based on GRLs and the application of Boolean operations to GRLs, and have been integrated with the luminosity calculation tools.

The ROOT-based tools have been designed to be used either in an ATLAS release or in stand-alone applications, for example on your laptop. Detailed instructions and examples on how to use the available GRL tools can be found in the tutorial.

One remark on using GRL selection tools. The existing selection tools are perfectly integrated with existing luminosity calculation tools. In a physics analysis, the selection based on good runs and luminosity blocks needs to happen in two places, namely: in the event loop, to skip events from bad runs and luminosity blocks, but also at the bookkeeping of all processed luminosity blocks, needed for the luminosity determination of the data sample processed.

In an Athena ntuple production job, these two selections happen in two different locations. The latter is a tricky business, and can easily go wrong when you do not use the official tools. So you are encouraged to use them!

In summary, good run lists are the way to select good data samples for physics analysis. Tools for their generation and application have been available for some time now. Good run lists are already heavily used in many analyses using minimum bias data. They are maintained and provided by the DQ group. Be sure to interact with your Combined Performance- or Physics Group to decide which DQ flag queries are appropriate for your physics analysis. And don’t forget to report them to the DQ group.




Max Baak