So, i need to aggregate rows where the date is the same.
My code, as of now, returns the following:
date value source
0 2018-04-08 15:52:26.110 1 ANAPRO
1 2018-04-22 12:14:38.807 1 ANAPRO
2 2018-04-22 12:34:18.403 1 ANAPRO
3 2018-04-22 12:40:35.877 1 ANAPRO
4 2018-04-22 12:53:57.897 1 ANAPRO
5 2018-04-22 13:02:45.180 1 ANAPRO
6 2018-05-04 17:41:15.840 1 ANAPRO
7 2018-04-22 15:03:54.353 1 ANAPRO
8 2018-04-22 15:24:27.030 1 ANAPRO
9 2018-04-22 15:27:56.813 1 ANAPRO
I don't think I can aggregate the columns while I have HH:MM:SS.ms being showed alongside the date (I only need the date)
I've tried this :
df['date'] = pandas.to_datetime(df['date'], format='%b %d %Y.%f').astype(str)
But to no avail, I still got the same return.
The code is:
Reads the my excel file (user input).
df = pandas.read_excel(var + '.xlsx')
Selects the columns I need, and create a new .xlsx to contain it.
df = df.iloc[:, 36].to_excel(var + '_.xlsx', index=False)
Opens the new .xlsx file.
df = pandas.read_excel(var + '_.xlsx')
Renames the column
df = df.rename(columns={'Prospect Dt. Cadastro': 'date'})
Adds the other columns I need.
df['value'] = 1
df['source'] = 'ANAPRO'
Tries to format the date.
df['date'] = pandas.to_datetime(df['date'], format='%b %d` %Y.%f').astype(str)
Creates the final xlsx, with all the formatted data.
df = df.to_excel('payload.xlsx')
Reads the final xlsx.
df = pandas.read_excel('payload.xlsx', names=['date', 'value', 'source'])
Prints the first 10 rows.
print(df.head(10))
I'm new to python, so sorry if I'm doing something awkward, thank you!
IIUC, you might want pandas.Series.dt.date
:
df['date'] = pandas.to_datetime(df['date']).dt.date
>>> df
date value source
0 2018-04-08 1 ANAPRO
1 2018-04-22 1 ANAPRO
2 2018-04-22 1 ANAPRO
3 2018-04-22 1 ANAPRO
4 2018-04-22 1 ANAPRO
5 2018-04-22 1 ANAPRO
6 2018-05-04 1 ANAPRO
7 2018-04-22 1 ANAPRO
8 2018-04-22 1 ANAPRO
9 2018-04-22 1 ANAPRO
Or, if your goal is aggregation using groupby
, you can retain all the information in your original date column, and group by only the date as such:
df['date'] = pandas.to_datetime(df['date'])
df.groupby(df['date'].dt.date)
# for example, to get the sum each day:
# df.groupby(df['date'].dt.date).sum()
# value
# date
# 2018-04-08 1
# 2018-04-22 8
# 2018-05-04 1
Or, using pd.Grouper
:
df['date'] = pandas.to_datetime(df['date'])
df.groupby(pd.Grouper(key='date', freq='D'))