Analysis Navigation with ATLAS TAGS

5 October 2009

Navigating in a flood of events.



Physicists have broad interests and responsibilities in ATLAS. They need to find and analyze events offline which suit their task at hand. Right now, a single event seems as priceless as a drop of water in the desert. But we hope in a few months we will be flooded with events, at which point you will need a water tight boat, a strong paddle and good navigation. The ATLAS TAGS application is a system intended to help you identify, count, pre-assess, locate, and retrieve the tidy pool of events you are interested in analyzing, extracted from the sea of events which you are not.

ATLAS TAGS are the files produced in the final stage of the ATLAS official event processing chain (from RAW → ESD → AOD → TAG). TAGS contain what we believe is a substantive set of event-wise metadata from which you can make your event selections. The content of TAG files are "uploaded" to Oracle relational databases hosted at various sites (Tier-0: CERN; Tier-1: BNL, TRIUMF; RAL; Tier-2: DESY).

The "ATLAS TAG Database application" is a collection of data and services: The data includes the event-wise TAG Database and Run and Luminosity Block-wise Conditions metadata along with dataset, file and processing metadata. The application centers around "ELSSI" (our Event Level Selection Service Interface), a web-based user interface to TAG Application services. ELSSI presents the user with a wide variety of options for customizing a set of requirements on the collected metadata. It allows the user to preview the results of customized criteria in real time which helps refine that criteria before the serious analysis begins. ELSSI allows you to preview sets of events without cracking open a single AOD. Then, it helps you navigate to the specific AOD files which contain the events ideally suited to your task at hand.

We (the ATLAS TAG developers) originally envisioned all data and services to be duplicated at all voluntary sites. But the combination of the sheer volume of database data and the difficulty of distributing services to diversely configured systems has, in the last six months, pointed us toward a different architecture. Our new model consists of distributed data and services which optimize the resources available at each of the sites in the TAG network. In the long run, we believe the new architecture will empower us to more efficiently deploy the TAG services which we observe users find most useful.

Event-wise TAGs are, as you might expect, only as good as the reprocessing stage that created them. The same is true for ESD, AOD, and in some respects the DPD... It is important to note that while this is true for reconstructed objects like electrons, jets and muons, the TAG database has the advantage that it can access and upload updated conditions data such as Data Quality assessments and Luminosity, which undoubtedly will evolve faster than the pace of reprocessing.

TAGs contain information to help users navigate transparently to the corresponding events in any of the earlier stages of processing, including RAW data. Using TAGs on a DPD based analysis presents a special challenge since the DPD(s) are produced after the official reprocessing chain is complete, so the navigation information from TAGs to DPDs is inherently missing. Ideas about how to append this information to the TAGs exist but this is one area where we need help from the collaboration for the implementation.

Nice idea, but does it work? The infrastructure behind the application is a technical challenge in many respects:

  • loading of potentially Tera-byte scale amounts of event metadata into the various Oracle sites,
  • anticipating the most useful conditions and dataset related metadata and uploading that data into systems compatible with the user interface as well as the application services,
  • tracking which data and services are where and optimizing those services transparently for the user,
  • handling cases where components of metadata are incomplete for particular runs or datasets,
  • gracefully restricting combining results from event collections satisfying user criteria but which should not be combined because of differences in configuration or processing,
  • anticipating and making available the most useful selection criteria satisfying a wide variety of users (from commissioning to physics analysis) in functional, intuitive and usable interfaces,
  • employing standard and customized utilities for processing user requests on the grid (for example, figuring out which files to pre-fetch, which may come from multiple datasets),
  • making it all work correctly and efficiently to convince users to use the TAGs rather than head straight for a heavy file based analysis (tussling through AOD datasets to formulate initial criteria).

Considerable progress has been made on many fronts. TAG production and upload are now a routine and automatic part of the official chain of event reconstruction. TAG production from reprocessing of real and simulated data is similarly automated, with upload to the database needing some manual intervention because of the human decision making process needed to ensure the hottest data is uploaded to the specific sites to enable the most efficient access to the users. Currently, TAGs are available for many simulation datasets (the latest being MC08 and TOPMIX) and online Commissioning/Cosmic runs. TAGs from October online commissioning and November first collisions will be produced and uploaded automatically in the course of Tier-0 reconstruction.

We are still in the process of identifying, organizing and uploading much of the run and dataset related metadata which are needed to formulate logically complete selection criteria and which is used behind the interface to complete the services architecture (generating the various outputs). Loading this data is half the battle -- the other half is making selections on this data available in functional, intuitive and usable interfaces. Because the application is evolving, the astute follower of ELSSI will notice a gradual expansion and reorganization of the selection criteria available, better exception handling to inform users of logical incongruities and more robust execution of application services.

Each ATLAS Software tutorial includes a session on TAG usage, the next of which is scheduled for October 21, 2009 (See the ATLAS Tutorial Indico). Click here for the portal to TAG Application Services which requires grid certification as well as that you be a member of the ATLAS VO. Noting that many aspects of the application are in development, user patience is greatly appreciated but feedback is welcome: mailto: ATLAS Physics Metadata.

Happy navigation!


 

Elizabeth Gallas

University of Oxford