Search code examples
pythonhtmlbeautifulsouparticle

How Can I get the same data different divs


How can I reach the text div? This is a website divs and I would like to get datas from different divs. There are more article2 divs in the col... divs. I need every text data. But my code don't working because i don't know how to reach the different divs with same time(col_6...,col_3... divs).

My code:

article_title = div.find('div', attrs={'class':'article2'}).find('div', attrs={'class':'text'}).find('h1')

The site code:

<div class="row">
        <div class="col_6 ct_12 cd_12 cm_12">
            <a href="https://kronikaonline.ro/erdelyi-hirek/uralkodasanak-helyszinen-a-gyulafehervari-varban-allitanak-emleket-bethlen-gabor-erdelyi-fejedelemnek">
            <div class="article2" style="padding-top:0px;">
               <div class="text">
               <h1>TITLE</h1>
               </div>
            </div>
            </a>
        </div>
        <div class="col_3 ct_12 cd_12 cm_12">
        </div>
   </div>

Solution

  • You could use find_all() or css selectors to select all your articles and iterate the ResultSet to get all information you like to scrape:

    for a in soup.select('div.article2'):
        print(f"title: {a.h1.text}")
        print(f"url: {a.find_previous('a').get('href')}")
    

    Example

    Extract data and store in list of dicts:

    from bs4 import BeautifulSoup
    html = '''
    <div class="row">
            <div class="col_6 ct_12 cd_12 cm_12">
                <a href="url1">
                <div class="article2" style="padding-top:0px;">
                   <div class="text">
                   <h1>TITLE1</h1>
                   </div>
                </div>
                </a>
            </div>
            <div class="col_6 ct_12 cd_12 cm_12">
                <a href="url2">
                <div class="article2" style="padding-top:0px;">
                   <div class="text">
                   <h1>TITLE2</h1>
                   </div>
                </div>
                </a>
            </div>
       </div>
    '''
    soup = BeautifulSoup(html)
    
    data = []
    
    for a in soup.select('div.article2'):
        data.append({
            'title': a.h1.text,
            'url': a.find_previous('a').get('href')
        })
    
    print(data)
    
    Output
    [{'title': 'TITLE1', 'url': 'url1'}, {'title': 'TITLE2', 'url': 'url2'}]