Search code examples
pythonweb-scrapingbeautifulsouphtml-parsing

Use BeautifulSoup to get a value after a specific tag


I'm having a very hard time getting BeautifulSoup to scrape some data for me. What's the best way to access the date (the actual numbers, 2008) from this code sample? It's my first time using Beautifulsoup, I've figured out how to scrape urls off of the page, but I can't quite narrow it down to only select the word Date, and then to only return whatever numeric date follows (in the dd brackets). Is what I'm asking even possible?

<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
        2008
    </dd>
</div>

Solution

  • Find the dt tag by text and find the next dd sibling:

    soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text
    

    The complete code:

    from bs4 import BeautifulSoup
    
    data = """
    <div class='dl_item_container clearfix detail_date'>
        <dt>Date</dt>
        <dd>
        2008
        </dd>
    </div>
    """
    
    soup = BeautifulSoup(data, 'html.parser')
    date_field = soup.find('div', class_='detail_date').find('dt', text='Date')
    print(date_field.find_next_sibling('dd').text.strip())
    

    Prints 2008.