Search code examples
python-3.xdomweb-scrapingbeautifulsoupurlopen

Accessing Hidden Tabs, Web Scraping With Python 3.6


I'm using bs4 and urllib.request in python 3.6 to webscrape. I have to open tabs / be able to toggle an "aria-expanded" in button tabs in order to access the div tabs I need.

The button tab when the tab is closed is as follows with <> instead of --:

button id="0-accordion-tab-0" type="button" class="accordion-panel-title u-padding-ver-s u-text-left text-l js-accordion-panel-title" aria-controls="0-accordion-panel-0" aria-expanded="false"

When opened, the aria-expanded="true" and the div tab appears underneath.

Any idea on how to do this?

Help would be super appreciated.


Solution

  • From your other post I'm guessing the URL is https://www.sciencedirect.com/journal/construction-and-building-materials/issues

    The web-page loads JSON from another URL when you click the link. You can request the JSON yourself without the need to click the link. All you need to know is the ISBN which never changes (09500618) and the year which you can pass in from a range. This even returns data from the tabs that are already expanded.

    import requests
    import json
    
    # The website rejects requests except from user agents it has not blacklisted so set a header
    headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0'
    }
    
    for i in range (1999, 2019):
        url = "https://www.sciencedirect.com/journal/09500618/year/" + str(i) + "/issues"
        r = requests.get(url, headers=headers)
        j = r.json()
    
        for d in j['data']:
            # Print the json object
            print (json.dumps(d, indent=4, sort_keys=True))
            # Or print specific values
            print (d['coverDateText'], d['volumeFirst'], d['uriLookup'], d['srctitle'])
    

    Outputs:

    {
        "cid": "271475",
        "contentFamily": "serial",
        "contentType": "JL",
        "coverDateStart": "19991201",
        "coverDateText": "1 December 1999",
        "hubStage": "H300",
        "issn": "09500618",
        "issueFirst": "8",
        "pages": [
            {
                "firstPage": "417",
                "lastPage": "470"
            }
        ],
        "pii": "S0950061800X00323",
        "sortField": "1999001300008zzzzzzz",
        "srctitle": "Construction and Building Materials",
        "uriLookup": "/vol/13/issue/8",
        "volIssueSupplementText": "Volume 13, Issue 8",
        "volumeFirst": "13"
    }
    1 December 1999 13 /vol/13/issue/8 Construction and Building Materials
    ...