Search code examples
pythonweb-scrapingbeautifulsouptags

extract tags from soup with BeautifulSoup


enter image description here

'''
<div class="kt-post-card__body>
<div class="kt-post-card__title">Example_1</div>
<div class="kt-post-card__description">Example_2</div>
<div class="kt-post-card__bottom">
<span class="kt-post-card__bottom-description kt-text-truncate" title="Example_3">Example_4</span>
</div>
</div>
'''

according to picture I attached, I want to extract all "kt-post-card__body" attrs and then from each one of them, extract:

("kt-post-card__title", "kt-post-card__description") 

like a list.

I tried this:

ads = soup.find_all('div',{'class':'kt-post-card__body'})

but with ads[0].div I only access to "kt-post-card__title" while "kt-post-card__body" has three other sub tags like: "kt-post-card__description" and "kt-post-card__bottom" ... , why is that?


Solution

  • Cause your question is not that clear - To extract the classes:

    for e in soup.select('.kt-post-card__body'):
        print([c for t in e.find_all() for c in t.get('class')])
    

    Output:

    ['kt-post-card__title', 'kt-post-card__description', 'kt-post-card__bottom', 'kt-post-card__bottom-description', 'kt-text-truncate']
    

    To get the texts you also have to iterate your ResultSet and could access each elements text to fill your list or use stripped_strings.

    Example
    from bs4 import BeautifulSoup
    
    html_doc='''
    <div class="kt-post-card__body">
    <div class="kt-post-card__title">Example_1</div>
    <div class="kt-post-card__description">Example_2</div>
    <div class="kt-post-card__bottom">
    <span class="kt-post-card__bottom-description kt-text-truncate" title="Example_3">Example_4</span>
    </div>
    </div>
    '''
    
    soup = BeautifulSoup(html_doc)
    
    for e in soup.select('.kt-post-card__body'):
        data = [
            e.select_one('.kt-post-card__title').text,
            e.select_one('.kt-post-card__description').text      
        ]
        print(data)        
    

    Output:

    ['Example_1', 'Example_2']
    

    or

    print(list(e.stripped_strings))
    

    Output:

    ['Example_1', 'Example_2', 'Example_4']