I'm using bs4 and urllib.request in python 3.6 to webscrape. I have to open tabs / be able to toggle an "aria-expanded" in button tabs in order to access the div tabs I need.
The button tab when the tab is closed is as follows with <> instead of --:
button id="0-accordion-tab-0" type="button" class="accordion-panel-title u-padding-ver-s u-text-left text-l js-accordion-panel-title" aria-controls="0-accordion-panel-0" aria-expanded="false"
When opened, the aria-expanded="true" and the div tab appears underneath.
Any idea on how to do this?
Help would be super appreciated.
From your other post I'm guessing the URL is https://www.sciencedirect.com/journal/construction-and-building-materials/issues
The web-page loads JSON from another URL when you click the link. You can request the JSON yourself without the need to click the link. All you need to know is the ISBN which never changes (09500618) and the year which you can pass in from a range. This even returns data from the tabs that are already expanded.
import requests
import json
# The website rejects requests except from user agents it has not blacklisted so set a header
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
for i in range (1999, 2019):
url = "https://www.sciencedirect.com/journal/09500618/year/" + str(i) + "/issues"
r = requests.get(url, headers=headers)
j = r.json()
for d in j['data']:
# Print the json object
print (json.dumps(d, indent=4, sort_keys=True))
# Or print specific values
print (d['coverDateText'], d['volumeFirst'], d['uriLookup'], d['srctitle'])
Outputs:
{
"cid": "271475",
"contentFamily": "serial",
"contentType": "JL",
"coverDateStart": "19991201",
"coverDateText": "1 December 1999",
"hubStage": "H300",
"issn": "09500618",
"issueFirst": "8",
"pages": [
{
"firstPage": "417",
"lastPage": "470"
}
],
"pii": "S0950061800X00323",
"sortField": "1999001300008zzzzzzz",
"srctitle": "Construction and Building Materials",
"uriLookup": "/vol/13/issue/8",
"volIssueSupplementText": "Volume 13, Issue 8",
"volumeFirst": "13"
}
1 December 1999 13 /vol/13/issue/8 Construction and Building Materials
...