Search code examples
pythonpandasdatetimeindex

Python - Pandas resample dataframe with strings and floats


I have a dataframe where the index is a datetimeindex, and every row is every day over the course of a couple years. I need to resample the dataframe by month where the two float columns are summed, but the string columns are all the unique values during that month. I can do the resampling to a single column, but I don't know how to do it to everything, or how to combine them back together if I do it one at a time.

For the floats I was trying:

# go through the column list
for col in col_list:
    # process all run time columns for month
    if "float" in str(col):
        # resample for one month and sum
        df[col] = df[col].resample('M').sum()
        # rename the column
        df.rename(columns={col: col + " MONTHLY"}, inplace=True)

and for the strings:

elif "string" in str(col):
    # get all the unique jobs run during the month
    df[col] = df[col].groupby(pd.Grouper(freq='M')).unique()
    df.rename(columns={col: col + " MONTHLY"}, inplace=True)

these were resulting in the monthly data being inserted into tho the dataframe with every day still existing though, and was hard to find and not what I need.

Some sample data:

        float_1 float_2 string_1    string_2
12/30/2019  1   2   a   a
12/31/2019  1   3   a   b
1/1/2020    2   4   a   c
1/2/2020    3   5   b   d

The expected output would be:

12/2019 2   5   a    a, b
1/2020  5   9   a, b c, d

Not sure if it matters but the real data does have NaN in random days throughout the data.


Solution

  • Try aggregate numeric columns and non numeric columns separately and then join them back:

    df.index = pd.to_datetime(df.index)
    
    numerics = df.select_dtypes('number').resample('M').sum()
    strings = df.select_dtypes('object').resample('M').agg(lambda x: ','.join(set(x)))
    
    numerics.join(strings)
    #            float_1  float_2 string_1 string_2
    #2019-12-31        2        5        a      a,b
    #2020-01-31        5        9      a,b      d,c