Search code examples
python-3.xbeautifulsouppython-requests

How to Scrape Answer from Quizlet Flashcard? BS4 and Requests


Using this page as an example:

https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/

How would one hypothetically scrape the text answer from behind the flashcard? It's hidden right now, but when you click on it, it rotates and shows the answer.

What I've seen so far looks like this, but the right element isn't being selected I'm sure:

def find_quizlet_flashcard_answer(quizlet_url):

    # desktop user-agent
    USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
    # mobile user-agent
    MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
    headers = {"user-agent": USER_AGENT}
    
    resp = requests.get(quizlet_url, headers=headers)

    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        inner_divs = soup.find_all("div", {"aria-hidden": "true"})
        for g in inner_divs:
            result = g.text
            print(result)
    return result

Solution

  • To get all questions and answers you can use this example:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/'
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    
    for i, (question, answer) in enumerate(zip(soup.select('a.SetPageTerm-wordText'), soup.select('a.SetPageTerm-definitionText')), 1):
        print('QUESTION {}'.format(i))
        print()
        print(question.get_text(strip=True, separator='\n'))
        print()
        print('ANSWER:')
        print(answer.get_text(strip=True, separator='\n'))
        print('-' * 160)
    

    Prints:

    QUESTION 1
    
    Which of the following are true regarding IPv4?
    a. 32-bit address
    b. 128-bit address
    c. Consists of a network ID and MAC address
    d. Consists of a host ID and MAC address
    
    ANSWER:
    a. 32-bit address
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    QUESTION 2
    
    How many bits does a standard IPv6 unicast address use to represent the network ID?
    a. 32
    b. 64
    c. 128
    d. 10
    
    ANSWER:
    b. 64
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    QUESTION 3
    
    Which of the following Windows PowerShell commands performs a DNS name query for www.contoso.com?
    a. ping www.contoso.com
    b. dnsquery www.contoso.com
    c. resolve-DNSName -Name www.contoso.com
    d. resolve-DNSquery www.comcast.net
    
    ANSWER:
    c. resolve-DNSName -Name www.contoso.com
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    ...and so on.