AGC Analysis Task Versions#

The below table gives a brief overview of the AGC versions. Each version here corresponds to a slightly altered task.

Versions#

Version

Datasets

Cuts

Systematics

Machine Learning

0

CMS 2015 Open Data (POET)

Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5

\(t\bar{t}\) sample variations, pt_scale variations, pt_res variations, btag variations, W + jets scale variations, luminosity

None

1

CMS 2015 Open Data (NanoAOD)

Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5

\(t\bar{t}\) sample variations, pt_scale variations, pt_res variations, btag variations, W + jets scale variations, luminosity

None

2 (WIP)

CMS 2015 Open Data (NanoAOD)

Exactly one lepton with \(p_T>30\) GeV; at least four jets with \(p_T>30\) GeV; at least one jet with \(b\)-tag > 0.5 (see Cuts for additional cuts)

\(t\bar{t}\) sample variations, pt_scale variations, pt_res variations, btag variations, W + jets scale variations, luminosity

BDT to predict jet-parton assignment in \(t\bar{t}\) events

Reference Implementation Versions#

This section is specific to the implementation in the main repository.

The below table gives a brief overview of the different tags of the reference implementation, including descriptions of minor versions and patches which are implementation-specific. Note that the major versions (0, 1, and 2) correspond to differences in analysis task (described above), while minor versions are reserved for individual implementations to assign for small changes and patches. Our reference implementation for each major task (0, 1, 2) will always be the latest tag within that series.

Tags#

Tag

Version

Available Pipelines

Systematics

Dependency Management

0.1.0

0

Pure coffea; coffea with ServiceX processor

Systematic variations within coffea processor are manually calculated using awkward array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)

Functions used in coffea processor are defined in the notebook

0.2.0

0

Pure coffea; create cached files using ServiceX queries followed by standalone coffea processing

Systematic variations within coffea processor are manually calculated using awkward array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)

Functions used in coffea processor are defined in the notebook

1.0.0

1

Pure coffea; create cached files using ServiceX queries followed by standalone coffea processing

Systematic variations within coffea processor are manually calculated using awkward array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)

Functions used in coffea processor are defined in the notebook

1.1.0

1

Pure coffea; create cached files using ServiceX queries followed by standalone coffea processing

Systematic variations within coffea processor are manually calculated using awkward array logic (jet \(p_T\) variations corrected)

Functions used in coffea processor are defined in the notebook

2.0.0 (WIP)

2

Pure coffea; create cached files using ServiceX queries followed by standalone coffea processing; optional machine learning component (with additional option to use NVIDIA Triton inference server)

Systematic variations within coffea processor handled by correctionlib

Modules are shipped to dask workers using cloudpickle

Datasets#

The datasets used for the CMS \(t\bar{t}\) notebook are from the 2015 CMS Open Data release. Versions 0.1.0 and 0.2.0 use ntuples generated using the Physics Objects Extractor Tool (POET).

All versions >=1.0.0 use NanoAOD instead. The NanoAOD was generated from the 2015 CMS Open Data release using this pull request of CMSSW: cms-sw/cmssw#39040. To set this up, the following commands should be run:

source /cvmfs/cms.cern.ch/cmsset_default.sh
scram list CMSSW_10_6_
scram project CMSSW_10_6_30
cd CMSSW_10_6_30/
cmsenv
cd src/
git cms-merge-topic 39040
ls -al
scram build -j5

From this point, for data, you can use:

cmsDriver.py --python_filename doublemuon_cfg.py --eventcontent NANOAOD --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --fileout file:doublemuon_nanoaod.root --conditions 106X_dataRun2_v36 --step NANO --filein file:doublemuon_miniaod.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --data -n -1

For MC, you can use:

cmsDriver.py --python_filename nanoaod15_cfg.py --eventcontent NANOAODSIM --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAODSIM --fileout file:nanoaod15.root --conditions 102X_mcRun2_asymptotic_v8 --step NANO --filein file:miniaod2015.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --mc -n -1

The code used to generate and subsequently merge these files is located in the following repository: ekauffma/produce-nanoAODs

The data used is the same, regardless of MiniAOD vs NanoAOD. The list of datasets separated by process is included below:

  • ttbar:

    • nominal:

      • 19980: Powheg + Pythia 8 (ext3), 2413 files, 3.4 TB -> converted

      • 19981: Powheg + Pythia 8 (ext4), 4653 files, 6.4 TB -> converted

    • scale variation:

      • 19982: same as below, unclear if overlap

      • 19983: Powheg + Pythia 8 “scaledown” (ext3), 902 files, 1.4 TB -> converted

      • 19984: same as below, unclear if overlap

      • 19985: Powheg + Pythia 8 “scaleup” (ext3), 917 files, 1.3 TB -> converted

    • ME variation:

      • 19977: same as below, unclear if overlap

      • 19978: aMC@NLO + Pythia 8 (ext1), 438 files, 647 GB -> converted

    • PS variation:

      • 19999: Powheg + Herwig++, 443 files, 810 GB -> converted

  • single top:

    • s-channel:

    • t-channel:

      • 19406: Powheg + Pythia 8 (antitop), 935 files, 1.1 TB -> converted

      • 19408: Powheg + Pythia 8 (top), 1571 files, 1.8 TB -> converted

    • tW:

      • nominal:

        • 19412: Powheg + Pythia 8 (antitop), 27 files, 30 GB -> converted

        • 19419: Powheg + Pythia 8 (top), 23 files, 30 GB -> converted

      • DS:

        • 19410: Powheg + Pythia 8 DS (antitop), 13 files, 15 GB

        • 19417: Powheg + Pythia 8 DS (top), 13 files, 14 GB

      • scale variations:

        • 19415: Powheg + Pythia 8 “scaledown” (antitop), 11 files, 15 GB

        • 19422: Powheg + Pythia 8 “scaledown” (top), 13 files, 15 GB

        • 19416: Powheg + Pythia 8 “scaleup” (antitop), 12 files, 14 GB

        • 19423: Powheg + Pythia 8 “scaleup” (top), 13 files, 14 GB

      • there are also larger NoFullyHadronicDecays samples: 19411, 19418

    • tZ / tWZ: potentially missing in inputs, not included in /ST_*

  • W+jets:

    • nominal (with 1l filter):

      • 20546: same as below, unclear if overlap

      • 20547: aMC@NLO + Pythia 8 (ext2), 5601 files, 4.5 TB -> converted

      • 20548: aMC@NLO + Pythia 8 (ext4), 4598 files, 3.8 TB -> converted

  • data:

    • single muon:

      • 24119: 1916 files, 1.4 TB -> converted

    • single electron:

      • 24120: 2974 files, 2.6 TB -> converted

    • validated runs:

More information about datasets can be found in analysis-grand-challenge/datasets/cms-open-data-2015/.

Cuts#

For versions 0.1.0, 0.2.0, and 1.0.0, the cuts used are the following:

  • Leptons (electrons and muons) must have \(p_T>25\) GeV

  • Events must contain exactly one lepton

  • Jets must have \(p_T>25\) GeV

  • Events must have at least four jets

  • Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.

  • Events must have at least one \(b\)-tagged jet

  • 4j1b Region: Events must have exactly one \(b\)-tagged jet

  • 4j2b Region: Events must have two or more \(b\)-tagged jets

This is modified to better reflect common practices in CMS in subsequent versions, using the following cuts:

  • Leptons (electrons and muons) must have \(p_T>30\) GeV, \(|\eta|<2.1\), and sip3d<4 (significance of 3d impact parameter)

  • For electrons, we also require cutBased==4 (tight)

  • For muons, we also require tightId and pfRelIso04_all<0.15 (PF relative isolation dR=0.4, total (deltaBeta corrections))

  • Events must contain exactly one lepton

  • Jets must have \(p_T>30\) GeV and \(|\eta|>2.4\) as well as satisfy isTightLeptonVeto

  • Events must have at least four jets

  • Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.

  • Events must have at least one \(b\)-tagged jet

  • 4j1b Region: Events must have exactly one \(b\)-tagged jet

  • 4j2b Region: Events must have two or more \(b\)-tagged jets