AGC Versions#

The below table gives a brief overview of all AGC versions.

AGC Versions#
Version	Datasets	Available Pipelines	Cuts	Machine Learning
0.1.0	CMS 2015 Open Data (POET)	Pure `coffea`; `coffea` with `ServiceX` processors; `ServiceX` followed by `coffea`	Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5	None
0.2.0	CMS 2015 Open Data (POET)	Pure `coffea`; `ServiceX` followed by `coffea`	Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5	None
1.0.0	CMS 2015 Open Data (NanoAOD)	Pure `coffea`; `ServiceX` followed by `coffea`	Exactly one lepton with \(p_T>25\) GeV; at least four jets with \(p_T>25\) GeV; at least one jet with \(b\)-tag > 0.5	None
2.0.0	CMS 2015 Open Data (NanoAOD)			BDT to predict jet-parton assignment in \(t\bar{t}\) events

Datasets#

The datasets used for the CMS \(t\bar{t}\) notebook are from the 2015 CMS Open Data release. Versions 0.1.0 and 0.2.0 use ntuples generated using the Physics Objects Extractor Tool (POET).

All versions >=1.0.0 use NanoAOD instead. The NanoAOD was generated from the 2015 CMS Open Data release using this pull request of CMSSW: cms-sw/cmssw#39040. To set this up, the following commands should be run:

source /cvmfs/cms.cern.ch/cmsset_default.sh
scram list CMSSW_10_6_
scram project CMSSW_10_6_30
cd CMSSW_10_6_30/
cmsenv
cd src/
git cms-merge-topic 39040
ls -al
scram build -j5

From this point, for data, you can use:

cmsDriver.py --python_filename doublemuon_cfg.py --eventcontent NANOAOD --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --fileout file:doublemuon_nanoaod.root --conditions 106X_dataRun2_v36 --step NANO --filein file:doublemuon_miniaod.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --data -n -1

For MC, you can use:

cmsDriver.py --python_filename nanoaod15_cfg.py --eventcontent NANOAODSIM --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAODSIM --fileout file:nanoaod15.root --conditions 102X_mcRun2_asymptotic_v8 --step NANO --filein file:miniaod2015.root --era Run2_25ns,run2_nanoAOD_106X2015 --no_exec --mc -n -1

The code used to generate and subsequently merge these files is located in the following repository: ekauffma/produce-nanoAODs

The data used is the same, regardless of MiniAOD vs NanoAOD. The list of datasets separated by process is included below:

ttbar:
- nominal:
  - 19980: Powheg + Pythia 8 (ext3), 2413 files, 3.4 TB -> converted
  - 19981: Powheg + Pythia 8 (ext4), 4653 files, 6.4 TB -> converted
- scale variation:
  - 19982: same as below, unclear if overlap
  - 19983: Powheg + Pythia 8 “scaledown” (ext3), 902 files, 1.4 TB -> converted
  - 19984: same as below, unclear if overlap
  - 19985: Powheg + Pythia 8 “scaleup” (ext3), 917 files, 1.3 TB -> converted
- ME variation:
  - 19977: same as below, unclear if overlap
  - 19978: aMC@NLO + Pythia 8 (ext1), 438 files, 647 GB -> converted
- PS variation:
  - 19999: Powheg + Herwig++, 443 files, 810 GB -> converted
single top:
- s-channel:
  - 19394: aMC@NLO + Pythia 8, 114 files, 76 GB -> converted
- t-channel:
  - 19406: Powheg + Pythia 8 (antitop), 935 files, 1.1 TB -> converted
  - 19408: Powheg + Pythia 8 (top), 1571 files, 1.8 TB -> converted
- tW:
  - nominal:
    - 19412: Powheg + Pythia 8 (antitop), 27 files, 30 GB -> converted
    - 19419: Powheg + Pythia 8 (top), 23 files, 30 GB -> converted
  - DS:
    - 19410: Powheg + Pythia 8 DS (antitop), 13 files, 15 GB
    - 19417: Powheg + Pythia 8 DS (top), 13 files, 14 GB
  - scale variations:
    - 19415: Powheg + Pythia 8 “scaledown” (antitop), 11 files, 15 GB
    - 19422: Powheg + Pythia 8 “scaledown” (top), 13 files, 15 GB
    - 19416: Powheg + Pythia 8 “scaleup” (antitop), 12 files, 14 GB
    - 19423: Powheg + Pythia 8 “scaleup” (top), 13 files, 14 GB
  - there are also larger NoFullyHadronicDecays samples: 19411, 19418
- tZ / tWZ: potentially missing in inputs, not included in /ST_*
W+jets:
- nominal (with 1l filter):
  - 20546: same as below, unclear if overlap
  - 20547: aMC@NLO + Pythia 8 (ext2), 5601 files, 4.5 TB -> converted
  - 20548: aMC@NLO + Pythia 8 (ext4), 4598 files, 3.8 TB -> converted
data:
- single muon:
  - 24119: 1916 files, 1.4 TB -> converted
- single electron:
  - 24120: 2974 files, 2.6 TB -> converted
- validated runs:
  - 24210: single txt file

More information about datasets can be found in analysis-grand-challenge/datasets/cms-open-data-2015/.

Cuts#

For versions 0.1.0, 0.2.0, and 1.0.0, the cuts used are the following:

Leptons (electrons and muons) must have \(p_T>25\) GeV
Events must contain exactly one lepton
Jets must have \(p_T>25\) GeV
Events must have at least four jets
Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.
Events must have at least one \(b\)-tagged jet
4j1b Region: Events must have exactly one \(b\)-tagged jet
4j2b Region: Events must have two or more \(b\)-tagged jets

This is modified to better reflect common practices in CMS in subsequent versions, using the following cuts:

Leptons (electrons and muons) must have \(p_T>30\) GeV, \(|\eta|<2.1\), and sip3d<4 (significance of 3d impact parameter)
For electrons, we also require cutBased==4 (tight)
For muons, we also require tightId and pfRelIso04_all<0.15 (PF relative isolation dR=0.4, total (deltaBeta corrections))
Events must contain exactly one lepton
Jets must have \(p_T>30\) GeV, \(|\eta|>2.4\), and isTightLeptonVeto
Events must have at least four jets
Jets are considered \(b\)-tagged if they have a \(b\)-tag score over B_TAG_THRESHOLD=0.5.
Events must have at least one \(b\)-tagged jet
4j1b Region: Events must have exactly one \(b\)-tagged jet
4j2b Region: Events must have two or more \(b\)-tagged jets

AGC Versions

Contents

AGC Versions#

Datasets#

Cuts#