I have a DataFrame in Python that has a "date" column (and several other columns).
What I want to do is to split off two DFs based on an arbitrary date in the "date" column. Let this split point be arbitrary_date
.
arbitrary_date
arbitrary_date
or another date up to a week in the future)To be clear -
arbitrary_date
) will be in the DFHere's an example to give a bit of realism to the request.
Start with this DF.
date data
20230101 e
20230108 E
20230115 S
20230117 L
20230117 f
20230118 u
20230118 B
20230121 r
20230127 n
20230128 o
If my arbitrary date is 2023-01-17 then I want the two DFs to be
20230101 e
20230108 E
20230115 S
and
20230117 L
20230117 f
But if my arbitrary date is 2023-01-22 then my DFs should be
20230101 e
20230108 E
20230115 S
20230117 L
20230117 f
20230118 u
20230118 B
20230121 r
and
20230127 n
How can I accomplish this dynamic split?
Try:
def split_df(df, date):
split_date = pd.to_datetime(date)
next_date = df["date"].between(split_date, split_date + pd.Timedelta(days=7))
df1 = df[df["date"] < split_date]
try:
df2 = df[df["date"].eq(df.loc[next_date, "date"].iat[0])]
except IndexError:
df2 = None
return df1, df2
df["date"] = pd.to_datetime(df["date"], format="%Y%M%d")
print(*split_df(df, "2023-01-17"), sep="\n\n")
print()
print(*split_df(df, "2023-01-22"), sep="\n\n")
print()
print(*split_df(df, "2023-02-22"), sep="\n\n")
Prints:
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
date data
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
5 2023-01-18 00:01:00 u
6 2023-01-18 00:01:00 B
7 2023-01-21 00:01:00 r
date data
8 2023-01-27 00:01:00 n
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
5 2023-01-18 00:01:00 u
6 2023-01-18 00:01:00 B
7 2023-01-21 00:01:00 r
8 2023-01-27 00:01:00 n
9 2023-01-28 00:01:00 o
None