Search code examples
pythonhtmlbeautifulsouptags

How to extract time class from HTML in python?


I have a piece of HTML code in python through beautifulsoup but am unable to retrieve the desired time tag from it.

HTML is called K:

<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>    

I can extract all tags except time:

K.a :
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>

K.li:
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>

K.time:
Nothing prints

I have also tried the following solution:

K.find('time', {'class':'dtstart'})
Nothing prints

K.find('a', {'class':'action pull-right print-cat'})
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>

When we inspect K we see the following:

Signature:      K(*args, **kwargs)
Type:           Tag
String form:   
<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>  
Length:         5
File:           ~/.local/lib/python3.6/site-packages/bs4/element.py
Source:    

How is it possible the time tag isn't being extracted?


Solution

  • I am still unsure why I am not able to receive it in the first place, but Chris Doyle paved the way to succes. We can simply resoup it and get the desired result:

    Date=soup(str(K), "html.parser").time.attrs["datetime"]
    print(Date)
    
    #Output
    {'class': ['dtstart'], 'datetime': '05 December 201710:30 AM GMT', 'id': 'x-event-date', 'xcdate': '1512469800950'}