Search code examples
pythonparsingbeautifulsouprequesturllib

Can't parse full webpage, using BeautifulSoup


Parsing page of QA service sli.do:

import urllib.request
from bs4 import BeautifulSoup

voting_url = "https://app.sli.do/event/i6jqiqxm/live/questions"
voting_page = urllib.request.urlopen(voting_url)

soup = BeautifulSoup(voting_page, 'lxml')

print(soup.prettify())

for link in soup.find_all('span'):
    print(link.get('Linkify'))

print(soup.prettify()) returns html-document, but there is no content with tag span class="Linkify", which contains text of questions. It could be found in Chrome: https://app.sli.do/event/i6jqiqxm/live/questions


Solution

  • You can go through the api as the data is generated dynamically. You might need to figure out the access_token part if that also changes dynamically.

    import requests
    
    s = requests.Session()
    auth = s.post('https://app.sli.do/api/v0.5/events/8ca635b0-e80e-47be-b506-cb131dbbed4c/auth').json()
    access_token = auth['access_token']
    
    url = 'https://app.sli.do/api/v0.5/events/8ca635b0-e80e-47be-b506-cb131dbbed4c/questions'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
    'authorization': 'Bearer %s' %access_token}
    payload = {
    'path': '/questions',
    'eventSectionId': '4145620',
    'sort': 'top',
    'highlighted_first': 'true',
    'limit': '9999'}
    
    jsonData= s.get(url, headers=headers, params=payload).json()
    
    
    for each in jsonData:
        print(each['text'])
    

    Output:

    Can I ask a question anonymously?
    How many participants does Slido support?
    Do participants need an account to join?
    Can I download the list of questions  from my Q&A?
    Can the moderators control what questions are seen?
    How do you pronounce Slido?
    Is it possible to change the colors of Slido so that they match our branding? 🎨
    What tools does Slido integrate with?
    Is it easy to ask a question? 
    Can i send a link to participants prior to event?
    Can participants submit questions at any time?
    Is there a profanity control for the text of the questions? 
    Is there an option to have a name required?
    Is Slido free to use?
    Is Slido good for a regular meeting q&a with the CEO where you can ask questions anonymously in advance?
    how do i upload slido into my powerpoint presentation?
    Can everyone see each other's questions?