I am very sorry to bother but I am new to Python3...
I am trying to parse HTML table to get a list of tickers and dates for which I would like to then populate stock prices from yahoo...
I have a cell in which there is a text and then a date in a following format: April 20, 2020 ... I would like to extract just the date so I can use it in Yahoo API after but I am getting errors with the following code
date=result.find("td", attrs {'class':'column5'}).text.replace('\n',' ')
date=datetime.datetime.strptime(date,'%B %d, %Y').strftime('%Y-%m-%d')
Many thanks for any guidance!
an illustration of my comment, use regex
to find all substrings matching the datetime format '%B %d, %Y'
and then convert the format as desired:
import re
from datetime import datetime
s = "April 20, 2020 April 3, 2020 March 18, 2020 February 29, 2020 March 29, 2019 March 19, 2019 1) September 20, 2018 - IPO ~20% 2) March 8, 2019 - exchange offer complete March 4, 2019 1) October 11, 2018 - IPO ~15% 2) March 1, 2019 - spinoff remaining stake February 25, 2019"
dates = re.findall('[a-zA-Z]+\ [0-9]{1,2},\ [0-9]{4}', s)
# ['April 20, 2020',
# 'April 3, 2020',
# 'March 18, 2020',
# 'February 29, 2020',
# 'March 29, 2019',
# 'March 19, 2019',
# 'September 20, 2018',
# 'March 8, 2019',
# 'March 4, 2019',
# 'October 11, 2018',
# 'March 1, 2019',
# 'February 25, 2019']
for d in dates:
print(datetime.strptime(d,'%B %d, %Y').strftime('%Y-%m-%d'))
# 2020-04-20
# 2020-04-03
# 2020-03-18
# 2020-02-29
# 2019-03-29
# 2019-03-19
# 2018-09-20
# 2019-03-08
# 2019-03-04
# 2018-10-11
# 2019-03-01
# 2019-02-25