AGC Analysis Task Versions#

The below table gives a brief overview of the AGC versions. Each version here corresponds to a slightly altered task.

Versions#
Version	Datasets	Cuts	Systematics	Machine Learning
0	CMS 2015 Open Data (POET)	Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5	\(t\bar{t}\) sample variations, `pt_scale` variations, `pt_res` variations, `btag` variations, W + jets scale variations, luminosity	None
1	CMS 2015 Open Data (NanoAOD)	Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5	\(t\bar{t}\) sample variations, `pt_scale` variations, `pt_res` variations, `btag` variations, W + jets scale variations, luminosity	None
2 (WIP)	CMS 2015 Open Data (NanoAOD)	Exactly one lepton with \(p_T>30\) GeV; at least four jets with \(p_T>30\) GeV; at least one jet with \(b\)-tag > 0.5 (see Cuts for additional cuts)	\(t\bar{t}\) sample variations, `pt_scale` variations, `pt_res` variations, `btag` variations, W + jets scale variations, luminosity	BDT to predict jet-parton assignment in \(t\bar{t}\) events

Reference Implementation Versions#

This section is specific to the implementation in the main repository.

The below table gives a brief overview of the different tags of the reference implementation, including descriptions of minor versions and patches which are implementation-specific. Note that the major versions (0, 1, and 2) correspond to differences in analysis task (described above), while minor versions are reserved for individual implementations to assign for small changes and patches. Our reference implementation for each major task (0, 1, 2) will always be the latest tag within that series.

Tags#
Tag	Version	Available Pipelines	Systematics	Dependency Management
0.1.0	0	Pure `coffea`; `coffea` with `ServiceX` processor	Systematic variations within `coffea` processor are manually calculated using `awkward` array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)	Functions used in `coffea` processor are defined in the notebook
0.2.0	0	Pure `coffea`; create cached files using `ServiceX` queries followed by standalone `coffea` processing	Systematic variations within `coffea` processor are manually calculated using `awkward` array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)	Functions used in `coffea` processor are defined in the notebook
1.0.0	1	Pure `coffea`; create cached files using `ServiceX` queries followed by standalone `coffea` processing	Systematic variations within `coffea` processor are manually calculated using `awkward` array logic (jet \(p_T\) variations are not propagated through signal region observable calculation)	Functions used in `coffea` processor are defined in the notebook
1.1.0	1	Pure `coffea`; create cached files using `ServiceX` queries followed by standalone `coffea` processing	Systematic variations within `coffea` processor are manually calculated using `awkward` array logic (jet \(p_T\) variations corrected)	Functions used in `coffea` processor are defined in the notebook
1.2.0	1	Pure `coffea`; create cached files using `ServiceX` queries followed by standalone `coffea` processing	Systematic variations within `coffea` processor are manually calculated using `awkward` array logic (b-tagging cuts corrected, no 1e-6 offsetting of histogram yields)	Functions used in `coffea` processor are defined in the notebook
2.0.0 (WIP)	2	Pure `coffea`; create cached files using `ServiceX` queries followed by standalone `coffea` processing; optional machine learning component (with additional option to use `NVIDIA Triton` inference server)	Systematic variations within `coffea` processor handled by `correctionlib`	Modules are shipped to `dask` workers using `cloudpickle`

Datasets#

The datasets used for the CMS \(t\bar{t}\) notebook are from the 2015 CMS Open Data release. Versions 0.1.0 and 0.2.0 use ntuples generated using the Physics Objects Extractor Tool (POET).

All versions >=1.0.0 use NanoAOD instead. The NanoAOD was generated from the 2015 CMS Open Data release using this pull request of CMSSW: cms-sw/cmssw#39040. To set this up, the following commands should be run:

source /cvmfs/cms.cern.ch/cmsset_default.sh
scram list CMSSW_10_6_
scram project CMSSW_10_6_30
cd CMSSW_10_6_30/
cmsenv
cd src/
git cms-merge-topic 39040
ls -al
scram build -j5

From this point, for data, you can use:

cmsDriver.py --python_filename doublemuon_cfg.py --eventcontent NANOAOD --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --fileout file:doublemuon_nanoaod.root --conditions 106X_dataRun2_v36 --step NANO --filein file:doublemuon_miniaod.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --data -n -1

For MC, you can use:

cmsDriver.py --python_filename nanoaod15_cfg.py --eventcontent NANOAODSIM --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAODSIM --fileout file:nanoaod15.root --conditions 102X_mcRun2_asymptotic_v8 --step NANO --filein file:miniaod2015.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --mc -n -1

The code used to generate and subsequently merge these files is located in the following repository: ekauffma/produce-nanoAODs

The data used is the same, regardless of MiniAOD vs NanoAOD. The list of datasets separated by process is included below:

ttbar:
- nominal:
  - 19980: Powheg + Pythia 8 (ext3), 2413 files, 3.4 TB -> converted
  - 19981: Powheg + Pythia 8 (ext4), 4653 files, 6.4 TB -> converted
- scale variation:
  - 19982: same as below, unclear if overlap
  - 19983: Powheg + Pythia 8 “scaledown” (ext3), 902 files, 1.4 TB -> converted
  - 19984: same as below, unclear if overlap
  - 19985: Powheg + Pythia 8 “scaleup” (ext3), 917 files, 1.3 TB -> converted
- ME variation:
  - 19977: same as below, unclear if overlap
  - 19978: aMC@NLO + Pythia 8 (ext1), 438 files, 647 GB -> converted
- PS variation:
  - 19999: Powheg + Herwig++, 443 files, 810 GB -> converted
single top:
- s-channel:
  - 19394: aMC@NLO + Pythia 8, 114 files, 76 GB -> converted
- t-channel:
  - 19406: Powheg + Pythia 8 (antitop), 935 files, 1.1 TB -> converted
  - 19408: Powheg + Pythia 8 (top), 1571 files, 1.8 TB -> converted
- tW:
  - nominal:
    - 19412: Powheg + Pythia 8 (antitop), 27 files, 30 GB -> converted
    - 19419: Powheg + Pythia 8 (top), 23 files, 30 GB -> converted
  - DS:
    - 19410: Powheg + Pythia 8 DS (antitop), 13 files, 15 GB
    - 19417: Powheg + Pythia 8 DS (top), 13 files, 14 GB
  - scale variations:
    - 19415: Powheg + Pythia 8 “scaledown” (antitop), 11 files, 15 GB
    - 19422: Powheg + Pythia 8 “scaledown” (top), 13 files, 15 GB
    - 19416: Powheg + Pythia 8 “scaleup” (antitop), 12 files, 14 GB
    - 19423: Powheg + Pythia 8 “scaleup” (top), 13 files, 14 GB
  - there are also larger NoFullyHadronicDecays samples: 19411, 19418
- tZ / tWZ: potentially missing in inputs, not included in /ST_*
W+jets:
- nominal (with 1l filter):
  - 20546: same as below, unclear if overlap
  - 20547: aMC@NLO + Pythia 8 (ext2), 5601 files, 4.5 TB -> converted
  - 20548: aMC@NLO + Pythia 8 (ext4), 4598 files, 3.8 TB -> converted
data:
- single muon:
  - 24119: 1916 files, 1.4 TB -> converted
- single electron:
  - 24120: 2974 files, 2.6 TB -> converted
- validated runs:
  - 24210: single txt file

More information about datasets can be found in analysis-grand-challenge/datasets/cms-open-data-2015/.

Cuts#

For versions 0.1.0, 0.2.0, and 1.0.0, the cuts used are the following:

Leptons (electrons and muons) must have \(p_T>25\) GeV
Events must contain exactly one lepton
Jets must have \(p_T>25\) GeV
Events must have at least four jets
Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.
Events must have at least one \(b\)-tagged jet
4j1b Region: Events must have exactly one \(b\)-tagged jet
4j2b Region: Events must have two or more \(b\)-tagged jets

This is modified to better reflect common practices in CMS in subsequent versions, using the following cuts:

Leptons (electrons and muons) must have \(p_T>30\) GeV, \(|\eta|<2.1\), and sip3d<4 (significance of 3d impact parameter)
For electrons, we also require cutBased==4 (tight)
For muons, we also require tightId and pfRelIso04_all<0.15 (PF relative isolation dR=0.4, total (deltaBeta corrections))
Events must contain exactly one lepton
Jets must have \(p_T>30\) GeV and \(|\eta|<2.4\) and satisfy a tight lepton veto (isTightLeptonVeto, or jetId==6)
Events must have at least four jets
Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.
Events must have at least one \(b\)-tagged jet
4j1b Region: Events must have exactly one \(b\)-tagged jet
4j2b Region: Events must have two or more \(b\)-tagged jets

AGC Analysis Task Versions

Contents

AGC Analysis Task Versions#

Reference Implementation Versions#

Datasets#

Cuts#