Search code examples
pythonbeautifulsoupfindall

Webscraping: Issues with using findAll in BeautifulSoup


I am trying to grab all the languages from this website https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352.

The line of code I have only gives me part of what I want.

soup.findAll("div",{"class":"block-wrapper block"})

Output: '[English: Spoken, Written]'

Based on the tags, I have also tried

soup.findAll("ul",{"class":"has-no-list-styles"})

Output: 'Personal InjuryProducts LiabilityElder LawConsumer LawDUI & DWIEmployment Law'


Solution

  • This should do it, I think:

    from bs4 import BeautifulSoup as bs
    url = 'https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352'
    data = requests.get(url)
    
    soup = bs(data.text,'lxml')
    target = soup.find_all("div",{"class":"heading-3 block-title iconed-heading font-w-bold"})
    for t in target:
        if t.find('span', class_="jicon -large jicon-languages"):
            langs = t.find_next_sibling()
            for lang in langs.find_all('li'):
                print(lang.text)  
    

    Output:

    English: Spoken, Written
    French: Spoken, Written
    Italian: Spoken, Written
    Persian: Spoken
    Spanish: Spoken, Written