Search code examples
pythonpandas-groupbypandas-melt

groupby and melt Function of COVID-19 URL data with Python


I created a function that read COVID-19 CSV file from a URL, drop some features, and groupby a Country/ Region names. I want this function to use DataFrame.melt to keep 'Country/Region' constant and bring all the dates that existed in rows as a single column. I face the error that "The following 'id_vars' is not present in the DataFrame: ['Country/Region']".

def meltedgroupby_covid_data (hopkin_url, case_type):
df =  pd.read_csv (hopkin_url)
df_drop = df.drop (['Province/State', 'Lat', 'Long'], axis = 1)
groupby_covid = df_drop.groupby(df_drop['Country/Region']).aggregate('sum')
meltedgrouped = groupby_covid.melt(id_vars = ['Country/Region'])
meltedgrouped.rename (columns = {'variable': "Date", "value": covid_case}, inplace = True)
return meltedgrouped

Calling Function

confirmed_statistic = groupby_covid_data('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv', 'Confirmed')

I need this to look like this at the end 'Country/Region', 'Date', 'Confirmed'. Can I do the groupby and melt in one function or It needs to be divided into two separated parts?


Solution

  • I think in this case you will need to add the next statement after doing the groupBy

    groupby_covid = groupby_covid.reset_index()
    

    What happens is that after grouping the Country/Region field is returning as an index and then that generates the error you are facing.