Search code examples
pythonpandasdate-range

Get range of dates between specified start and end date from csv using python


I have a problem in which i have a CSV file with StartDate and EndDate, Consider 01-02-2020 00:00:00 and 01-03-2020 00:00:00

And I want a python program that finds the dates in between the dates and append in next rows like

CSV File

So here instead of dot , it should increment Startdate and keep End date as it is.

import pandas as pd

df = pd.read_csv('MyData.csv')

df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df = df.explode('Dates')
df

So for example , if i have StartDate as 01-02-2020 00:00:00 and EndDate as 05-02-2020 00:00:00

As result i should get

Result

All the result DateTime should be in same format as in MyData.Csv StartDate and EndDate

Only the StartDate will change , rest should be same

I tried doing it with date range. But am not getting any result. Can anyone please help me with this.

Thanks


Solution

  • My two cents: a very simple solution based only on functions from pandas:

    import pandas as pd
    
    # Format of the dates in 'MyData.csv'
    DT_FMT = '%m-%d-%Y %H:%M:%S'
    
    df = pd.read_csv('MyData.csv')
    
    # Parse dates with the provided format
    for c in ('StartDate', 'EndDate'):
        df[c] = pd.to_datetime(df[c], format=DT_FMT)
    
    # Create the DataFrame with the ranges of dates
    date_df = pd.DataFrame(
        data=[[d] + list(row[1:])
              for row in df.itertuples(index=False, name=None)
              for d in pd.date_range(row[0], row[1])],
        columns=df.columns.copy()
    )
    
    # Convert dates to strings in the same format of 'MyData.csv'
    for c in ('StartDate', 'EndDate'):
        date_df[c] = date_df[c].dt.strftime(DT_FMT)
    

    If df is:

       StartDate    EndDate   A   B   C
    0 2020-01-02 2020-01-06  ME  ME  ME
    1 2021-05-15 2021-05-18  KI  KI  KI
    

    then date_df will be:

                 StartDate              EndDate   A   B   C
    0  01-02-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
    1  01-03-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
    2  01-04-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
    3  01-05-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
    4  01-06-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
    5  05-15-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
    6  05-16-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
    7  05-17-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
    8  05-18-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
    

    Then you can save back the result to a CSV file with the to_csv method.