Search code examples
pythonpandasdataframefor-loopgroup-by

how to write a loop to replace all nan values using groupby?


I have this: https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022 dataset with lots of nan values, in country there are many duplicate values that I need them. I want to groupby the 'Country' column and then get each country's other columns(total, Coal, per capita...) mean value and replace them into original dataframe. is there any way to do this?

I wrote a peace of code but I can not develop it:

countryunique = df['Country'].unique()
dc = df.groupby('Country').mean().reset_index()


for b in dc['Country']:
    for i in countryunique:
        if (b == i):
            df['Total'] = df.fillna(dc['Country'][b])

I tried this but this is not true

for col in df.columns:
    if col != 'Country' and col !='ISO 3166-1 alpha-3' and col!='Year':
        df[col] = df[col].fillna(str(df[col].mean()))

Solution

  • You want to calculate the mean of all columns based on country right?

    I don't know if it's okay, but I'm using pivot_table() instead of groupby()

    df = pd.read_csv('GCB2022v27_MtCO2_flat.csv')
    dc = df.pivot_table(index=['Country', 'ISO 3166-1 alpha-3'], aggfunc='mean').reset_index()
    

    this is the output Output