How do I fill the Date column so that when it detects a date it adds that date to the below rows, until it sees a new date starts adding that date?
Reproducible example:
Input:
Date Headline
0 Mar-20-21 04:03AM Apple CEO Cook, executives on tentative list o...
1 03:43AM Apple CEO Cook, execs on tentative list of wit...
2 Mar-19-21 10:19PM Dow Jones Futures: Why This Market Rally Is So...
3 06:13PM Zuckerberg: Apples Privacy Move Could Spur Mor...
4 05:45PM Apple (AAPL) Dips More Than Broader Markets: W...
5 04:17PM Facebook Stock Jumps As Zuckerberg Changes Tun...
6 04:03PM Best Dow Jones Stocks To Buy And Watch In Marc...
7 01:02PM The Nasdaq's on the Rise Friday, and These 2 S...
Desired Output:
Date Headline
0 Mar-20-21 04:03AM Apple CEO Cook, executives on tentative list o...
1 Mar-20-21 03:43AM Apple CEO Cook, execs on tentative list of wit...
2 Mar-19-21 10:19PM Dow Jones Futures: Why This Market Rally Is So...
3 Mar-19-21 06:13PM Zuckerberg: Apples Privacy Move Could Spur Mor...
4 Mar-19-21 05:45PM Apple (AAPL) Dips More Than Broader Markets: W...
5 Mar-19-21 04:17PM Facebook Stock Jumps As Zuckerberg Changes Tun...
6 Mar-19-21 04:03PM Best Dow Jones Stocks To Buy And Watch In Marc...
7 Mar-19-21 01:02PM The Nasdaq's on the Rise Friday, and These 2 S...
Attempt:
df['Time'] = [x[-7:] for x in df['Date']]
df['Date'] = [x[:-7] for x in df['Date']]
# Some code that fills the date
# Then convert to datetime
Before you use ffill()
you need to split the two columns to get the correct time, and only fill in the Date part. You will need to replace spaces with np.nan
to use ffill()
. Then put the columns back together and wrap that operation in pd.to_datetime
to get the correct dtype
.
Lastly you can drop the time column.
# Imports
import numpy as np
import pandas as pd
# Split the column
df[['Date','Time']] = df['Date'].str.split(' ',expand=True)
# Replace space with nan and use ffill()
df['Date'] = df['Date'].replace(r'^\s*$', np.nan, regex=True).ffill()
# Put the columns back and convert to datetime
df['Date'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# Drop the time column
del(df['Time'])
Will get you back:
df
Date Headline
0 2021-03-20 04:03:00 Apple CEO Cook, executives on tentative list o...
1 2021-03-20 03:43:00 Apple CEO Cook, execs on tentative list of wit...
2 2021-03-19 22:19:00 Dow Jones Futures: Why This Market Rally Is So...
3 2021-03-19 18:13:00 Zuckerberg: Apples Privacy Move Could Spur Mor...
4 2021-03-19 17:45:00 Apple (AAPL) Dips More Than Broader Markets: W...
5 2021-03-19 16:17:00 Facebook Stock Jumps As Zuckerberg Changes Tun...
6 2021-03-19 16:03:00 Best Dow Jones Stocks To Buy And Watch In Marc...
7 2021-03-19 13:02:00 The Nasdaq's on the Rise Friday, and These 2 S...
EDIT
If you want your 'Date' to show exactly as you have it in your desired outcome, i.e. this format 'Mar-20-21', don't wrap it in pd.to_datetime()
and keep it as an object
:
df['Date'] = df['Date'] + ' ' + df['Time']
df
Date Headline
0 Mar-20-21 04:03AM Apple CEO Cook, executives on tentative list o...
1 Mar-20-21 03:43AM Apple CEO Cook, execs on tentative list of wit...
2 Mar-19-21 10:19PM Dow Jones Futures: Why This Market Rally Is So...
3 Mar-19-21 06:13PM Zuckerberg: Apples Privacy Move Could Spur Mor...
4 Mar-19-21 05:45PM Apple (AAPL) Dips More Than Broader Markets: W...
5 Mar-19-21 04:17PM Facebook Stock Jumps As Zuckerberg Changes Tun...
6 Mar-19-21 04:03PM Best Dow Jones Stocks To Buy And Watch In Marc...
7 Mar-19-21 01:02PM The Nasdaq's on the Rise Friday, and These 2 S...