Search code examples
pythonpandasdataframedatasetmean

How do I average row values using Python and create a new data frame?


I'm currently working with a data frame.

Data

The year goes all the way to 2022. What I'm trying to do is create a new dataframe that has two columns.

1)Year 2)Average of the 'extent' column for those corresponding years.

So to illustrate-

Year Average Extent
1978 12.487
1979 12.320

and so on until 2022.

I tried using pandas .groupby method and some others such as

new_df = df[['year','Extent...']].copy()

but I'm unsure how to group the rows for each year and give the average yearly 'extent' value for those corresponding years.

I hope this makes sense.

Any advice/tips will be appreciated. Thanks a lot!

Thanks a lot!


Solution

  • This should run out of the box:

    import pandas as pd
    
    df = pd.read_csv("https://raw.githubusercontent.com/Shambok97/Sample-Data-/main/data.csv")
    
    # Removing whitespace at the ends of the column names for convenience
    df.columns = df.columns.str.strip()
    
    out = df.groupby("Year")["Extent (10^6 sq km)"].mean()
    

    out:

    Year
    1978    12.48700
    1979    12.31956
    Name: Extent (10^6 sq km), dtype: float64