Search code examples
pythonpandasgroup-by

Summing dataframe values and merging into a 1 index with summed up values


I have a dataframe that goes like

       datetime           cnt
1     2015-01-04 00:00:00  102                      datetime   cnt
2     2015-01-04 01:00:00  23     Turns into  1    2015-01-04  170
3     2015-01-04 02:00:00  45

datetime = datetime, cnt = bicycles rent this day in this hour I want to turn such 3 indexes in 1 index that has datetime without hours and summed up cnt.

And you can only merge indexes that happen the same day, but different hours

path2 = r'C:\Users\bossd\OneDrive\Документы\zhopa123.csv'
df2 = pd.read_csv(path2) 
col2015 = df2.loc[df2['year'] == 2015]
a = col2015['cnt']
b = col2015['timestamp']

I tried this algorithm to merge two indexes and sum up their cnt values, but it didn't work

for k in len(df2+1):
    if (df2.loc[k, 'timestamp']).date() == (df2.loc[k+1, 'timestamp']).date():
        df2.loc[df2.index[k], 'cnt'] + df2.loc[df2.index[k+1], 'cnt']
        df2.drop(df2.index[k+1])
    elif (df2.loc[k, 'timestamp']).date() != (df2.loc[k+1, 'timestamp']).date():
        pass

I got an error TypeError: can only concatenate str (not "int") to str

The dataframe i have


Solution

  • IIUC, you can use .groupby:

    df = df.groupby(df["datetime"].dt.date)["cnt"].sum().reset_index()
    
    print(df)
    

    Prints:

         datetime  cnt
    0  2015-01-04  170
    

    EDIT:

    • .dt.date returns the date portion from datetime Series.

    • Series.reset_index() creates a dataframe from the Series (the index was the the date, now it's a column and the dataframe has new index - starting from 0)