As the title mentions, I'm attempting to grab data from several pages using aiohttp and asyncio. However, the problem I'm having involves the program grabbing the info from the pages too quickly then exiting. The webpage needs to update its contents first (which can take a couple of seconds) and then refresh to display the properly updated contents, which are what I want to collect.
Is there a way I can load the page, wait a few seconds, refresh the page, and then read the contents of it? This is what my current fetch method looks like:
async def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return await response.text()
When you load url in your browser tab, browser sends request to get url's content (which includes in our case only html text). Then browser searches for links in this html - links to images, to css, to scripts and sends requests to load it too. When browser loads some of this links it updates view of your page, in particular when javascript link loaded browser starts to execute it (updating page's html content). When all links needed to display page loaded and all scripts executed - your page is fully loaded.
From all this process request lib like aiohttp
do only first thing - sends request to get url's content (response.text()
). It wouldn't load scripts links inside this content, it wouldn't execute them to modify content.
What you ask can't be done with aiohttp
.
If you need to load content with executed javascript you need much more complicated browser-based solution like PyQt.