Search code examples
pythonpandasphysicsuproot

Retrieve data in Pandas


I am using pandas and uproot to read data from a .root file, and I get a table like the following one:

enter image description here

The aforementioned table is made with the following code:

fname = 'ZZ4lAnalysis_VBFH.root' 
key = 'ZZTree/candTree'
ttree = uproot.open(fname)[key]
branches = ['Z1Flav', 'Z2Flav', 'nCleanedJetsPt30', 'LepPt', 'LepLepId'] 
df = ttree.pandas.df(branches, flatten=False)

I need to find the maximum value in LepPt, and, once found the maximum, I also need to retrieve the LepLepId of that maximum value. I have no problem in finding the maximum values:

Pt_l1 = [max(i) for i in df.LepPt]

In this way I get an array with all the maximum values. However, I have to separate such values according to the LepLepId. So I need an array with the maximum LepPt and |LepLepId|=11 and one with the maximum LepPt and |LepLepId|=13.

If someone could give me any hint, advice and/or suggestion, I would be very grateful.


Solution

  • I made some mock data since you didn't provide yours in any easy format. I think this is what you are looking for.

    import pandas as pd
    
    df = pd.DataFrame.from_records(
        [   [[1,2,3], [4,5,6]],
            [[4,6,5], [7,8,9]]
        ],
        columns=['LepPt', 'LepLepld']
    )
    
    df['max_LepPt'] = [max(i) for i in df.LepPt]
    
    def f(row):
        # get index position within list
        pos = row['LepPt'].index(row['max_LepPt']).tolist()
        return row['LepLepld'][pos]
    
    df['same_index_LepLepld'] = df.apply(lambda x: f(x), axis=1)
    

    returns:

        LepPt       LepLepld    max_LepPt   same_index_LepLepld
    0   [1, 2, 3]   [4, 5, 6]   3           6
    1   [4, 6, 5]   [7, 8, 9]   6           8