Search code examples
pythondatetimeweb-scrapingbeautifulsoupstrptime

I have an issue related to scraping date from a website using python and Beautifulsoup like there it is the splitting issue where `.split('.', "")


I have an issue related to scraping date from a website using python and Beautifulsoup like there I am facing the splitting issue where .split('.', "") is not working on scraping only date from this p tag <p class="text-xs">Oct 24, 2017 • 4 min read</p> Actually I don't want this dot and 4 min read from this p tag

Published_Date = soup.select_one('p[class="text-xs"]').get('datetime')

Solution

    1. The bold big dot is different that . dot you are using in split() method.

    2. So replace the bold big dot with a symbol and split that symbol and take the first value using list slicing

    Example:

    from bs4 import BeautifulSoup
    
    html ='''
    <p class="text-xs">Oct 24, 2017 • 4 min read</p>
    
    '''
    
    soup = BeautifulSoup(html,'html.parser')
    
    date = soup.select_one('p.text-xs').get_text(strip=True)
    print(date.replace('•','|').split('|')[0])
    

    Output:

    Oct 24, 2017