Search code examples
pythonpandassortingaveragespreadsheet

Is it possible to Average only certain sections of a spreadsheet with python by specifying the sections you want based on another factor?


I am trying to average the sea temperature for the fall and spring of each year in my data set. Imagine three columns (year/season/temp) which list things such as: 1963, FALL, 75 and continues with various years and the spring/fall season. How could I code to find the average of the temperatures that are in the fall of 1963 then the spring of 1963 then the fall of 1964 and so on all the way until 2021? My goal is to be able to show the temperature changes over time from those averages

I only have the temperature vs time scatter plot as of now and wasn't expected to have any issues but i think having multiple temperatures for each year that contradict each other (by not separating the fall/spring) is really hurting my r2 value


Solution

  • With pandas you can perform a groupby on the data frame. Assuming the column names are year, season and Temp something like the following should work:

    import numpy as np
    import pandas as pd
    
    avg_df = df.groupby(['year','season']).agg({'Temp':[np.mean, np.std]})
    avg_df.columns = ['Mean', 'STD']