Search code examples
pythonweb-scrapingbeautifulsouphtml-parsing

How can I get the second span using BeautifulSoup in python?


I'm trying to get the second span value in this div and others like it (shown below)

<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>

I've tried looking at similar stack posts, but I still couldn't figure out how to fix this. Here's my current code:

time = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
    for i in time:
        print(i.text) #this prints VALUE 1 x amount of times (there are multiple divs)

I've tried things like i.span, i.contents, i.children, etc. I'd really appreciate any help, thanks!


Solution

  • There are several ways to get the value you want.

    from simplified_scrapy.simplified_doc import SimplifiedDoc
    html='''
    <div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
        <span>VALUE 1</span>
        <i aria-hidden="true" class="Mx(4px)">•</i>
        <span>TRYING TO GET THIS</span>
    </div>
    '''
    doc = SimplifiedDoc(html)
    divs = doc.getElementsByClass('C(#959595) Fz(11px) D(ib) Mb(6px)')
    for div in divs:
      value = div.getElementByTag('span',start='</span>') # Use start to skip the first
      print (value)
      value = div.getElementByTag('span',before='<span>',end=len(div.html)) # Locate the last
      print (value)
      value = div.i.next # Use <i> to locate
      print (value)
      value = div.spans[-1]
      print (value)
      print (value.text)
    

    Result:

    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    TRYING TO GET THIS