Search code examples
pythonpandasdataframedateseries

Get next date after an arbitrary date in a Python Series


I have a DataFrame in Python that has a "date" column (and several other columns).

What I want to do is to split off two DFs based on an arbitrary date in the "date" column. Let this split point be arbitrary_date.

  1. All rows with a date before arbitrary_date
  2. Any rows with the one next date in the series (which may be arbitrary_date or another date up to a week in the future)

To be clear -

  • I don't know what the arbitrary date will be until runtime
  • I cannot split by index because I have no guarantee than any specific indices go into either DF
  • I don't know how many days will elapse between the arbitrary date and the next date
  • I don't know how many of the next date (after arbitrary_date) will be in the DF

Here's an example to give a bit of realism to the request.

Start with this DF.

date        data
20230101    e
20230108    E
20230115    S
20230117    L
20230117    f
20230118    u
20230118    B
20230121    r
20230127    n
20230128    o

If my arbitrary date is 2023-01-17 then I want the two DFs to be

20230101    e
20230108    E
20230115    S

and

20230117    L
20230117    f

But if my arbitrary date is 2023-01-22 then my DFs should be

20230101    e
20230108    E
20230115    S
20230117    L
20230117    f
20230118    u
20230118    B
20230121    r

and

20230127    n

How can I accomplish this dynamic split?


Solution

  • Try:

    def split_df(df, date):
        split_date = pd.to_datetime(date)
    
        next_date = df["date"].between(split_date, split_date + pd.Timedelta(days=7))
    
        df1 = df[df["date"] < split_date]
        try:
            df2 = df[df["date"].eq(df.loc[next_date, "date"].iat[0])]
        except IndexError:
            df2 = None
    
        return df1, df2
    
    
    df["date"] = pd.to_datetime(df["date"], format="%Y%M%d")
    
    print(*split_df(df, "2023-01-17"), sep="\n\n")
    print()
    
    print(*split_df(df, "2023-01-22"), sep="\n\n")
    print()
    
    print(*split_df(df, "2023-02-22"), sep="\n\n")
    

    Prints:

                     date data
    0 2023-01-01 00:01:00    e
    1 2023-01-08 00:01:00    E
    2 2023-01-15 00:01:00    S
    
                     date data
    3 2023-01-17 00:01:00    L
    4 2023-01-17 00:01:00    f
    
    
                     date data
    0 2023-01-01 00:01:00    e
    1 2023-01-08 00:01:00    E
    2 2023-01-15 00:01:00    S
    3 2023-01-17 00:01:00    L
    4 2023-01-17 00:01:00    f
    5 2023-01-18 00:01:00    u
    6 2023-01-18 00:01:00    B
    7 2023-01-21 00:01:00    r
    
                     date data
    8 2023-01-27 00:01:00    n
    
    
                     date data
    0 2023-01-01 00:01:00    e
    1 2023-01-08 00:01:00    E
    2 2023-01-15 00:01:00    S
    3 2023-01-17 00:01:00    L
    4 2023-01-17 00:01:00    f
    5 2023-01-18 00:01:00    u
    6 2023-01-18 00:01:00    B
    7 2023-01-21 00:01:00    r
    8 2023-01-27 00:01:00    n
    9 2023-01-28 00:01:00    o
    
    None