ATLAS: analysing the data

21 April 2008


The Primary DPD’s provide a way to harvest the data and collect only the bits needed by doing thinning skimming, slimming. This may be a strategy to reduce the load on the grid even with early data. (From Stathes Paganis’s talk during April Overview Week)

With first data only a few months away, much work has gone into the tools and methodology that will be used to analyze that data.

The basic format of the analysis object data (AOD) has been fixed. This format can be analyzed from within the ATLAS software framework (Athena) either interactively via python or using compiled C++. The same AOD file can be accessed directly from ROOT outside the framework. The latter, AthenaRootAccess, while it does not provide all the services of Athena, such as the conditions data, does provide a means of doing "analysis on a laptop". The installation of libraries provided via an additional ("tar-ball") file of order 1GB will give such users access to some of the Athena classes when performing such an analysis, provided that the laptop is running CERN Scientific Linux. The development of this single format reduces both the storage requirements and the need for duplication of analysis tools for those users working inside Athena and those using ROOT.

The current AOD is somewhat larger than the 100Kb/event that is used in the computing resource planning. It also contains more information than is likely to be used by a typical analysis. It is therefore expected that Derived Physics Data (DPD) will be obtained by a further processing of the AOD. This processing would remove entire events (skimming), remove entire data containers within an event (slimming) and remove parts of data container (thinning) that were not needed for a particular analysis. The resulting DPD will be smaller, so that more events can be stored locally and be much faster to analyze. A primary DPD would use the same AOD format so that any analysis that ran on it could also run on the parent AOD, reducing the need to duplicate tools and resulting in a faster validation of the analysis. Software is under development that would allow the addition of "user data" to this DPD. For example, the basic content of a DPD being used for Supersymmetry searches could be obtained by skimming away events with low jet multiplicity and little missing transverse energy, slimming away jet containers made by unused jet algorithms, and be thinned by removing track parameters that are not used in such analyses. The total data volume occupied by DPD's is constrained by the available resources: only once we have experience from real data will we be able to reach the best compromises between DPD size, number of different DPD's and the number of copies.

The Full Dress Rehearsal provides an opportunity for users to become familiar with the tools and analysis model before actual data arrive. A natural follow-up to the CSC note exercise would be for analysts to become familiar with the tools that will be used for real data. DPD's were not made for the small volume of low luminosity FDR1 data: the tools should be used working on AOD. However, DPD’s will be made for FDR2.




Ian Hinchliffe

Lawrence Berkeley National Laboratory