The "DataMCForAnalysis" TWiki - a ‘one stop shop’

31 May 2010

The twiki-site


“Which data and MC files should I use?” is the first question, when it comes to analyse data. As new versions of the data and the Monte Carlo are frequently produced, this can be rather confusing.

“The reconstruction of detector data requires extremely complex software algorithms to translate the measured raw hits and energy deposits together with spatial and timing information into physics objects,” explains Data Preparation Coordinator Andreas Hoecker summarising the problem.

The reconstruction software is constantly improving, as the detailed understanding of the detector response grows.“If the reconstruction software changes from one data sample to another, it is hard to disentangle genuine detector response effects - which also may be varying - from effects introduced by the software change. As a consequence, it is often preferable to have a stable reconstruction, rather than to have the latest and greatest software features,” Andreas continues.

On the other hand, some changes in the reconstruction software will not affect physics, but will help the operations team running the detector to better and faster detect possible problems. Such changes can be introduced. “All this is comprised in what we call the ‘frozen reconstruction strategy’ : for a well defined period of time, we do not allow any changes in the reconstruction software that would affect physics analysis. At the end of this period, the reconstruction software is upgraded to the next better release, again frozen, and so on,” he adds.

These periods are defined by so-called ‘reprocessing cycles’ Any physics analysis needs to compare real collision data with simulated ones. Therefore it is necessary to have a consistent set of simulated data that are reconstructed with exactly the same software release as the real data. As with the beginning of each reprocessing cycle, all data taken so far is reprocessed. All this constitutes a huge amount of detailed information - more information than the typical user needs.

So how does a physics analyst in ATLAS know which software version was used with the data that is promptly reconstructed in the Tier-0? Here is where the ‘DataMCForAnalysis’ twiki page comes in.

“We learned from users that it was not very transparent to them which MC and data belong together for their analysis. We mostly invented this new twiki page to help them understand this. The information was previously mostly contained in emails but now is in a central location for all ATLAS users. In addition we also use this now as a place to inform users of known problems,” said Deputy Data Preparation Coordinator Beate Heinemann, who co-initiated the twiki.

The idea was to setup something simple, a single source of information where analysts would find all what they need.


But this is not the only purpose. "Periods of consistent data processing are also lumped together in a huge dataset, denoted a "Physics Container". The advantage of such Physics Containers is that any computer program analysing the whole data chunk only needs to be given a single name, the one of the Physics Container. It will then be able to analyse all the data belonging to this container. The DataMCForAnalysis twiki indicates the corresponding container for each of the (re)processing periods," explains Andreas.

Another very important 'ingredient' is the data quality information. And that's the role of the Data Quality Group, who checks the data quality for all incoming data, and flags defective data as bad. Good-run-lists are created for physics analysis, using as input the data quality information. These good-run-lists depend on the analysis type, and the detector systems that it relies on. The DataMCForAnalysis twiki provides links to the relevant good-run-lists information.

“Primarily, the twiki provides a list of 'production/AMI' tags which were used in data and Monte Carlo reprocessing campaigns and which should be then used together in the analysis. The plan is to keep the twiki up-to-date with main issues found in the reprocessing campaigns and add subsequent useful information and links from other existing twikis, to create a 'one stop shop' for physicists doing analysis,” said Monte Carlo Production Coordinator Borut Kersevan.

“I don't know if I would say that the analysis can now be completed more quickly. But, it's meant to help with the flow of information. To make it easier for users to know what data they should use, what are the advantages and disadvantages of different options, etc.” answered Reprocessing Coordinator Adam Gibson, when asked about the benefices of the twiki.

And Andreas Hoecker adds: “With more data coming in, the page will grow and we will always have to settle a compromise between providing all the necessary and useful information, while being concise, in other words digestible for the ATLAS users.” So feedback from users would be highly appreciated!

Birgit Ewert

ATLAS e-News