I've written a script in python
in combination with pyppeteer
to scrape the titles
and links
to the titles of different posts from a webpage. The thing is when I run my script, it can parse the first title
and the link
of the first post there. My intention is to create a loop to get them all. As I'm very new to work using this library, I can't find any idea how can I create a loop. Any help will be appreciated.
My script so far:
import asyncio
from pyppeteer import launch
async def get_titles_n_links():
wb = await launch(headless=True)
page = await wb.newPage()
await page.goto('https://stackoverflow.com/questions/tagged/web-scraping')
element = await page.querySelector('.question-hyperlink')
title = await page.evaluate('(element) => element.textContent', element)
link = await page.evaluate('(element) => element.href', element)
print(f'{title}\n{link}\n')
await wb.close()
asyncio.get_event_loop().run_until_complete(get_titles_n_links())
Your code will be like:
import asyncio
from pyppeteer import launch
async def get_titles_n_links():
wb = await launch(headless=True)
page = await wb.newPage()
await page.goto('https://stackoverflow.com/questions/tagged/web-scraping')
elements = await page.querySelectorAll('.question-hyperlink')
for element in elements:
title = await page.evaluate('(element) => element.textContent', element)
link = await page.evaluate('(element) => element.href', element)
print(f'{title}\n{link}\n')
await wb.close()
asyncio.get_event_loop().run_until_complete(get_titles_n_links())