Search code examples
pythonpython-3.xbeautifulsouphtml-parsing

Sorting elements in the same class by their branch and ancestors


I've got the following html (all the elements name*, name**, and name*** are unknown):

    <div class="one">nameA</a>
    <div class="two">nameAA</a>
        <a class="three">nameAAA</a>
        <a class="three">nameAAB</a>
        </div>
    <div class="two">nameAB</a>
        <a class="three">nameABA</a>
        <a class="three">nameABB</a>
        </div>
    </div>
<div class="one">nameB</a>
    <div class="two">nameBA</a>
        <a class="three">nameBAA</a>
        <a class="three">nameBAB</a>
        </div>
    <div class="two">nameBB</a>
        <a class="three">nameBBA</a>
        <a class="three">nameBBB</a>
        </div>
    </div>

and trying to make this dictionary:

names= {nameA:[nameAAA, nameAAB, nameABA, nameABB], nameB:[nameBAA, nameBAB, nameBBA, nameBBB]}

I'm using beautifulSoup select function but cannot link between the names in "three" descendant class it returns with the names of their ancestor in class "one". Actually the result in my code is: wordOnesText = [nameA, nameB] wordThreesText = [nameAAA, nameAAB, nameABA, nameABB, nameBAA, nameBAB, nameBBA, nameBBB]

res = requests.get('address')
soup = bs4.BeautifulSoup(res.text, features='html.parser')
wordOnes = soup.select('.one')
wordThrees = soup.select('.three') or soup.select('.one > .two > .three')

Could you help me to link these two list in a dictionary?


Solution

  • Try the following code.

    itemdict={}
    soup=BeautifulSoup(data,'lxml')
    for item in soup.select('.one'):
        itemlist = []
        name=item.contents[0].strip()
        for child in item.select('.three'):
            itemlist.append(child.text)
        itemdict[name]=itemlist
    
    print(itemdict)
    

    This should print.

    {'nameA': ['nameAAA', 'nameAAB', 'nameABA', 'nameABB'], 'nameB': ['nameBAA', 'nameBAB', 'nameBBA', 'nameBBB']}