I have a problem in which i have a CSV file with StartDate and EndDate, Consider 01-02-2020 00:00:00 and 01-03-2020 00:00:00
And I want a python program that finds the dates in between the dates and append in next rows like
So here instead of dot , it should increment Startdate and keep End date as it is.
import pandas as pd
df = pd.read_csv('MyData.csv')
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df = df.explode('Dates')
df
So for example , if i have StartDate as 01-02-2020 00:00:00 and EndDate as 05-02-2020 00:00:00
As result i should get
All the result DateTime should be in same format as in MyData.Csv StartDate and EndDate
Only the StartDate will change , rest should be same
I tried doing it with date range. But am not getting any result. Can anyone please help me with this.
Thanks
My two cents: a very simple solution based only on functions from pandas
:
import pandas as pd
# Format of the dates in 'MyData.csv'
DT_FMT = '%m-%d-%Y %H:%M:%S'
df = pd.read_csv('MyData.csv')
# Parse dates with the provided format
for c in ('StartDate', 'EndDate'):
df[c] = pd.to_datetime(df[c], format=DT_FMT)
# Create the DataFrame with the ranges of dates
date_df = pd.DataFrame(
data=[[d] + list(row[1:])
for row in df.itertuples(index=False, name=None)
for d in pd.date_range(row[0], row[1])],
columns=df.columns.copy()
)
# Convert dates to strings in the same format of 'MyData.csv'
for c in ('StartDate', 'EndDate'):
date_df[c] = date_df[c].dt.strftime(DT_FMT)
If df
is:
StartDate EndDate A B C
0 2020-01-02 2020-01-06 ME ME ME
1 2021-05-15 2021-05-18 KI KI KI
then date_df
will be:
StartDate EndDate A B C
0 01-02-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
1 01-03-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
2 01-04-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
3 01-05-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
4 01-06-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
5 05-15-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
6 05-16-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
7 05-17-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
8 05-18-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
Then you can save back the result to a CSV file with the to_csv
method.