Search code examples
pythonweb-scrapingbeautifulsouphtml-parsing

Getting the href of <a> tag which is in <li>


How to get the href of the all the tag that is under the class "Subforum" in the given code?

<li class="subforum">
<a href="Link1">Link1 Text</a>
</li>
<li class="subforum">
<a href="Link2">Link2 Text</a>
</li>
<li class="subforum">
<a href="Link3">Link3 Text</a>
</li>

I have tried this code but obviously it didn't work.

Bs = BeautifulSoup(requests.get(url).text,"lxml")
Class = Bs.findAll('li', {'class': 'subforum"'})
for Sub in Class:
    print(Link.get('href'))

Solution

  • The href belongs to a tag, not li tag, use li.a to get a tag

    Document: Navigating using tag names

    import bs4
    
    html = '''<li class="subforum">
     <a href="Link1">Link1 Text</a>
     </li>
     <li class="subforum">
    <a href="Link2">Link2 Text</a>
    </li>
    <li class="subforum">
    <a href="Link3">Link3 Text</a>
    </li>`<br>'''
    
    soup = bs4.BeautifulSoup(html, 'lxml')
    for li in soup.find_all(class_="subforum"):
        print(li.a.get('href'))
    

    out:

    Link1
    Link2
    Link3
    

    Why use class_:

    It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, class, is a reserved word in Python. Using class as a keyword argument will give you a syntax error.As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_.