Search code examples
pythonweb-scrapingbeautifulsoup

How to get <li> tag information (BeautifulSoup Webscraping)?


I am scraping the information from this page:
https://lawyers.justia.com/lawyer/michael-paul-ehline-85006 . I am trying to scrape all the information in under the fees section. What I want is the following information: Free Consultation Yes Credit Cards Accepted Visa, Mastercard, American Express Contingent Fees In personal injury cases only. Rates, Retainers and Additional Information Rates vary on a case by case basis.

This is what I have tried:

for thing in soup.findAll('ul', attrs={"class": "has-no-list-styles"}):
   ul=thing.find('<li>')
   print(ul)

but the output is:

<li>Intellectual Property</li>
<li>Copyright Law</li>
<li><strong>English</strong></li>

Thank you in advance.

UPDATE: I found a solution but it gives me an infinite loop, any suggestions?

for o in soup.findAll('div', attrs={"class": "block-wrapper"}):     
    for tag in soup.findAll('div', attrs={"class": "block-wrapper"}):
        if tag.string:
            tag.string.replace_with("")
        for de in o.findAll("li"):
            if de != []:
                de=remove_tags(str(de))
                print (de)

Solution

  • Try this.

    from simplified_scrapy import SimplifiedDoc,req
    html = req.get('https://lawyers.justia.com/lawyer/michael-paul-ehline-85006')
    doc = SimplifiedDoc(html)
    ul = doc.getElement('ul',attr='class',value='has-no-list-styles',start='class="jicon -large jicon-fee"') # Use class="jicon -large jicon-fee" to locate
    print (ul.text)
    

    Result:

    Free ConsultationYesCredit Cards AcceptedVisa, Mastercard, American ExpressContingent FeesIn personal injury cases only.Rates, Retainers and Additional InformationRates vary on a case by case basis.