Search code examples
pythonbeautifulsouphtml-parsing

How to extract specific part of html using Beautifulsoup?


I am trying to extract the what's within the 'title' tag from the following html, but so far I didn't manage to.

<div class="pull_right date details" title="22.12.2022 01:49:03 UTC-03:00">

This is my code:

from bs4 import BeautifulSoup

with open("messages.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

results = soup.find_all('div', attrs={'class':'pull_right date details'})

print(results)

And the output is a list with all <div for the html file.


Solution

  • To access the value inside title. Simply call ['title'].

    If you use find_all, then this will return a list. Therefore you will need an index (e.g [0]['title'])

    For example:

    from bs4 import BeautifulSoup
    
    fp = '<html><div class="pull_right date details" title="22.12.2022 01:49:03 UTC-03:00"></html>'
    soup = BeautifulSoup(fp, 'html.parser')
    
    results = soup.find_all('div', attrs={'class':'pull_right date details'})
    
    print(results[0]['title'])
    

    Or:

    results = soup.find('div', attrs={'class':'pull_right date details'})
    
    print(results['title'])
    

    Output:

    22.12.2022 01:49:03 UTC-03:00
    22.12.2022 01:49:03 UTC-03:00