I have this: https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022 dataset with lots of nan values, in country there are many duplicate values that I need them. I want to groupby the 'Country' column and then get each country's other columns(total, Coal, per capita...) mean value and replace them into original dataframe. is there any way to do this?
I wrote a peace of code but I can not develop it:
countryunique = df['Country'].unique()
dc = df.groupby('Country').mean().reset_index()
for b in dc['Country']:
for i in countryunique:
if (b == i):
df['Total'] = df.fillna(dc['Country'][b])
I tried this but this is not true
for col in df.columns:
if col != 'Country' and col !='ISO 3166-1 alpha-3' and col!='Year':
df[col] = df[col].fillna(str(df[col].mean()))
You want to calculate the mean of all columns based on country right?
I don't know if it's okay, but I'm using pivot_table() instead of groupby()
df = pd.read_csv('GCB2022v27_MtCO2_flat.csv')
dc = df.pivot_table(index=['Country', 'ISO 3166-1 alpha-3'], aggfunc='mean').reset_index()