Search code examples
python-3.xdatetimeyahoo-finance

DateTime Parsing from string for Yahoo Finance


I am very sorry to bother but I am new to Python3...

I am trying to parse HTML table to get a list of tickers and dates for which I would like to then populate stock prices from yahoo...

I have a cell in which there is a text and then a date in a following format: April 20, 2020 ... I would like to extract just the date so I can use it in Yahoo API after but I am getting errors with the following code

date=result.find("td", attrs {'class':'column5'}).text.replace('\n',' ')
date=datetime.datetime.strptime(date,'%B %d, %Y').strftime('%Y-%m-%d')

Many thanks for any guidance!


Solution

  • an illustration of my comment, use regex to find all substrings matching the datetime format '%B %d, %Y'and then convert the format as desired:

    import re
    from datetime import datetime
    
    s = "April 20, 2020 April 3, 2020 March 18, 2020 February 29, 2020 March 29, 2019 March 19, 2019 1) September 20, 2018 - IPO ~20% 2) March 8, 2019 - exchange offer complete March 4, 2019 1) October 11, 2018 - IPO ~15% 2) March 1, 2019 - spinoff remaining stake February 25, 2019"
    
    dates = re.findall('[a-zA-Z]+\ [0-9]{1,2},\ [0-9]{4}', s)
    # ['April 20, 2020',
    #  'April 3, 2020',
    #  'March 18, 2020',
    #  'February 29, 2020',
    #  'March 29, 2019',
    #  'March 19, 2019',
    #  'September 20, 2018',
    #  'March 8, 2019',
    #  'March 4, 2019',
    #  'October 11, 2018',
    #  'March 1, 2019',
    #  'February 25, 2019']
    
    for d in dates:
        print(datetime.strptime(d,'%B %d, %Y').strftime('%Y-%m-%d'))
    # 2020-04-20
    # 2020-04-03
    # 2020-03-18
    # 2020-02-29
    # 2019-03-29
    # 2019-03-19
    # 2018-09-20
    # 2019-03-08
    # 2019-03-04
    # 2018-10-11
    # 2019-03-01
    # 2019-02-25