Search code examples
pythonpandasdataframeaverage

Calculating averages dynamically with python


I have a large dataset with thousands of rows though fewer columns, i have ordered them by row values so that each of the 'objects' are grouped together, just like the dataset in Table1 below:

#Table1 :

data = [['ALFA', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05], 
        ['ALFA', 401740.00, 0.43, 0.26, 0.23, 0.16, 0.09], 
        ['ALFA', 892350.00, 0.58, 0.24, 0.05, 0.07, 0.4], 
        ['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04], 
        ['Charlie', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Charlie', 401740.00, 0.43, 0.26, 0.14, 0.37, 0.06], 
        ['Charlie', 511830.00, 0.52, 0.16, 0.13, 0.22, 0.01], 
        ['Delta', 590030.00, 0.75, 0.2, 0.34, 0.3, 0], 
        ['Delta', 590030.00, 0.75, 0.2, 0, 0.28, 0], 
        ['Delta', 351740.00, 0.31, 0.22, 0.44, 0.19, 0.05], 
        ['Echo', 892350.00, 0.58, 0.24, 0.23, 0.16, 0.09], 
        ['Echo', 590030.00, 0.75, 0.2, 0.05, 0.07, 0.4], 
        ['Echo', 590030.00, 0.75, 0.2, 0.08, 0.26, 0], 
        ['Echo', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Foxtrot', 401740.00, 0.43, 0.26, 0.27, 0.2, 0.01], 
        ['Foxtrot', 511830.00, 0.52, 0.16, 0.29, 0.11, 0.04], 
        ['Golf', 590030.00, 0.75, 0.2, 0.27, 0.2, 0.01], 
        ['Golf', 590030.00, 0.75, 0.2, 0.14, 0.37, 0.06], 
        ['Golf', 351740.00, 0.31, 0.22, 0.13, 0.22, 0.01], 
        ['Hotel', 892350.00, 0.58, 0.24, 0.34, 0.3, 0], 
        ['Hotel', 590030.00, 0.75, 0.2, 0, 0.28, 0], 
        ['Hotel', 590030.00, 0.75, 0.2, 0.29, 0.11, 0.04]]

df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])

df

However i would like to partition the data by these objects and get the averages for all the columns only in a separate table much like Table2 below:

#Table2: 

data2 = [['ALFA', 548610.00, 0.44, 0.24, 0.24, 0.14, 0.18], 
        ['Bravo', 511830.00, 0.52, 0.16, 0.08, 0.26, 0], 
        ['Charlie', 545615.00, 0.66, 0.20, 0.21, 0.25, 0.03], 
        ['Delta', 510600.00, 0.60, 0.21, 0.26, 0.26, 0.02], 
        ['Echo', 665610.00, 0.71, 0.21, 0.13, 0.22, 0.14], 
        ['Foxtrot', 456785.00, 0.48, 0.21, 0.28, 0.16, 0.03], 
        ['Golf', 510600.00, 0.60, 0.21, 0.18, 0.26, 0.03], 
        ['Hotel', 690803.33, 0.69, 0.21, 0.21, 0.23, 0.01]]

df2 = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'])

df2

Please note that the number of the objects vary across the dataset so the query would be able to count the number of objects and use that number to get the average of all the columns for each object and then present all these values in a new table.

For instance note that the '548610.00' values in Table2 for ALFA(column1) is merely an addition of Column1 values of ALFA in Table1 (351740.00 + 401740.00 + 401740.00) and divide by the count of ALFA being '3'.


Solution

  • Just use the groupby() function from pandas:

    df.groupby('Objects').mean()
    

    Instead of mean() other functions like min() are possible as well.