Search code examples
pythonexcelgroup-bypandas-groupbydata-cleaning

Remove Date Grouping from Data


Looking to clean multiple data sets in a more automated way. The current format is year as column, month as row, the number values.

Below is an example of the current format, the original data has multiple years/months.

Current Format:

Year Jan Feb
2022 300 200

Below is an example of how I would like the new format to look like. It combines month and year into one column and transposes the number into another column.

How would I go about doing this in excel or python? Have files with many years and multiple months.

New Format:

Date Number
2022-01 300
2022-02 200

Solution

  • Check below solution. You need to extend month_df for the months, current just cater to the example.

    import pandas as pd
    
    df = pd.DataFrame({'Year':[2022],'Jan':[300],'Feb':[200]})
    
    month_df = pd.DataFrame({'Char_Month':['Jan','Feb'], 'Int_Month':['01','02']})
    
    melted_df = pd.melt(df, id_vars=['Year'], value_vars=['Jan', 'Feb'], var_name='Char_Month',value_name='Number')
    
    pd.merge(melted_df, month_df,left_on='Char_Month', right_on='Char_Month').\
    assign(Year=melted_df['Year'].astype('str')+'-'+month_df['Int_Month'])\
    [['Year','Number']]
    

    Output:

    enter image description here