Search code examples
pythonpandasdataframepandas-groupbystring-operations

combine dataframe column with similar name. and concate values with separated by ',' (comma)


input file contains the product and its price on a particular date

product  05-Oct-2020  07-Oct-2020 09-Nov-2020 13-Nov-2020
A        66.2         69.5        72.95       76.55
B        368.7        382.8       384.7       386.8

output file should, combine all the days of month in one column and concatenate values with separated with comma (,)

product   Oct-2020         Nov-2020
A         66.2, 69.5       72.95, 76.55
B         368.7, 382.8     384.7, 386.8

i tried to change column name with date format , from '1-jan-2020' to 'jan-2020' with

keys = [dt.strptime(key, "%d-%b-%Y").strftime("%B-%Y") for key in data.keys()]

and after df transpose we can use groupby.

like there is option to group by and sum the values as :-

df.groupby().sum()

is there something that can join values (string operation) with separate them with comma.

click here to get sample data

any direction is appreciated.


Solution

  • The trick is to use Grouper on the colums:

    inp = pd.read_excel("Stackoverflow sample.xlsx")
    
    df = inp.set_index("Product")
    df.columns = pd.to_datetime(df.columns)
    
    out = (
        df
        .T
        .groupby(pd.Grouper(level=0, freq="MS"))
        .agg(lambda xs: ", ".join(map(str, filter(pd.notnull, xs))))
        .T
    )
    

    Using the provided sample this yields the following 5 first rows for out: enter image description here

    If you want to convert to a particular date formatting do

    out.columns = out.columns.strftime("%b-%Y")
    

    which results in enter image description here