Using this page as an example:
https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/
How would one hypothetically scrape the text answer from behind the flashcard? It's hidden right now, but when you click on it, it rotates and shows the answer.
What I've seen so far looks like this, but the right element isn't being selected I'm sure:
def find_quizlet_flashcard_answer(quizlet_url):
# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
headers = {"user-agent": USER_AGENT}
resp = requests.get(quizlet_url, headers=headers)
if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")
inner_divs = soup.find_all("div", {"aria-hidden": "true"})
for g in inner_divs:
result = g.text
print(result)
return result
To get all questions and answers you can use this example:
import requests
from bs4 import BeautifulSoup
url = 'https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for i, (question, answer) in enumerate(zip(soup.select('a.SetPageTerm-wordText'), soup.select('a.SetPageTerm-definitionText')), 1):
print('QUESTION {}'.format(i))
print()
print(question.get_text(strip=True, separator='\n'))
print()
print('ANSWER:')
print(answer.get_text(strip=True, separator='\n'))
print('-' * 160)
Prints:
QUESTION 1
Which of the following are true regarding IPv4?
a. 32-bit address
b. 128-bit address
c. Consists of a network ID and MAC address
d. Consists of a host ID and MAC address
ANSWER:
a. 32-bit address
----------------------------------------------------------------------------------------------------------------------------------------------------------------
QUESTION 2
How many bits does a standard IPv6 unicast address use to represent the network ID?
a. 32
b. 64
c. 128
d. 10
ANSWER:
b. 64
----------------------------------------------------------------------------------------------------------------------------------------------------------------
QUESTION 3
Which of the following Windows PowerShell commands performs a DNS name query for www.contoso.com?
a. ping www.contoso.com
b. dnsquery www.contoso.com
c. resolve-DNSName -Name www.contoso.com
d. resolve-DNSquery www.comcast.net
ANSWER:
c. resolve-DNSName -Name www.contoso.com
----------------------------------------------------------------------------------------------------------------------------------------------------------------
...and so on.