Get a tag on that b-jet: B-tagging in ATLAS

8 December 2008

Identification of b-jets is crucial for reaching the full physics potential of ATLAS. Several channels of interest have b quarks in their final state, while the backgrounds to these channels often have few b-jets. Without distinguishing the b-jets from other jets, these backgrounds are difficult to reduce. The tagging of b-jets will play an important role in the precision measurements of the top quark which decays almost exclusively to a b quark and a W. Any channel that has top in its signature or where top is a background will therefore benefit from b-tagging.

Another example is one of the important channels for a low mass Higgs: H→bb. This decay mode is best searched for in events where the Higgs is produced in association with a top, anti-top pair (tt), where the final state has a distinct signature of four b quarks. This is quite challenging as the production cross section is small, so high, efficient b-tagging is required and good rejection against large non b-jet backgrounds is essential. Looking beyond the Standard Model, many channels contain b quarks such as Higgs in SUSY models, decay chains of SUSY and decays of heavy gauge bosons.

During the hadronization process, a b quark will typically form a b hadron which carries most of the momentum of the original b quark.  The separation of b-jets from other jets takes advantage of two facts. First, hadrons containing a b quark have a relatively long lifetime resulting in decay lengths of the order of a few millimeters. The silicon tracker in ATLAS with its high resolution pixel detector makes it possible to measure these small distances in the center of the beam pipe. Tracks which originate from the b hadron will form secondary vertices and will have larger impact parameters than tracks coming from the primary vertex. Second, about 20% of b hadrons (or subsequently produced charmed hadrons) decay via a lepton and the properties of a lepton found near the jet can help to identify the jet as a b-jet.

The most sophisticated b-taggers use probability distributions of various variables such as the impact parameter. A weight is calculated as the ratio of the likelihood that a track with a given impact parameter comes from a b-jet to the likelihood that it comes from a light jet (a jet originating from an up, down or strange quark or a gluon). By summing the weights for all tracks within the vicinity of the jet one gets good discrimination between b-jets and light jets as seen in the figure below. The same is done for other variables such as various characteristics of secondary vertices.  For example the vertex mass tends to be higher in b-jets. 

Jet b-tagging weight distribution for b-jets, c-jets and light jets for the transverse impact parameter tagging algorithm, IP2D.

One can gain further discrimination by exploiting the topology of the decay chain from a b hadron to a charm hadron. This also improves the possibility to discriminate b jets and charm jets. The best performance taggers combine impact parameter and secondary vertex weights where for b-tagging efficiencies of 50% one can achieve rejections against light jets of several hundred. A rejection of 100 means that for every 100 light jets, one is tagged incorrectly as a b-jet. The second figure shows the light rejection versus b tagging efficiency for several taggers.

Rejection of light jets versus b-jet efficiency for tt events for several tagging algorithms: JetProb compares impact parameter with a resolution function, IP2D uses transverse impact parameter, IP3D combines transverse and longitudinal impact parameter information, IP3D+SV1 combines IP3D with secondary vertex information, JetFitter uses topological information in the secondary vertex fit.

While the most advanced taggers give the best performance, they rely on probability distribution functions which are obtained from Monte-Carlo or rely on sophisticated techniques to be derived from data. Therefore during commissioning, simpler taggers will be used such as counting the number of tracks with high impact parameter or comparing the impact parameter to a resolution function that can be more readily obtained from data. While they do not perform as well, these taggers can still achieve rejection factors of around 100 at a b-tagging efficiency of 50%.

Since the b-tagging relies on high precision tracking, it is very sensitive to any degradation of the tracking performance such as material interactions, detector inefficiencies, and the ability to unambiguously assign hits to tracks, especially in high density jets. It is also sensitive to the alignment of the Inner Detector and studies have been made to estimate the effects of misalignment.  In the Monte Carlo production for the CSC studies, large misalignments were put in the simulation that were typical of the known placements and built precision of the detector. While most studies used perfect knowledge of these misalignments during reconstruction, a realistic exercise was done where the actual alignment procedures which will be used in ATLAS were applied. Using the resulting alignment, we saw at most a 25% reduction in the b-tagging performance.

It is important not to entirely rely on the calibrations from Monte-Carlo and so procedures have been developed to extract the b-tagging efficiency from data. Methods similar to those used at the Tevatron can be utilized with QCD di-jet events and work well for jets up to about 80 GeV. The uncertainties are mostly systematic dominated with a precision of around 6%. The large number of tt events at LHC will allow the b-tagging calibration to be obtained with top events. As one expects two b-jets to be present in tt events, by counting the number of jets tagged as b-jets, one can extract the b-tagging efficiency. It is also possible to fully reconstruct the tt event and obtain a high purity sample of b-jets.  It is difficult to get a 100% pure sample so data-driven background subtraction techniques were developed. The b-jet samples obtained can then be used directly to measure the b-tagging efficiency or obtain any property of the b-jet such as the probability distribution functions. These methods can achieve statistical precision of around 6% with a few hundred pb-1 of data. The measurement of the rate at which light jets are mistagged as b-jets is also very challenging and is a subject of on going work.

Making b-tagging a reality in ATLAS has been, and continues to be, a complex task involving the work of many people with more always welcome. Much of the work that has been done over the last several years is documented in the soon to be released CSC book.


Grant Gorfine

University of Wuppertal