Search code examples
pythonhtmlweb-scrapingbeautifulsouppython-webbrowser

Scraping data from website Python - After interaction


Hello guys!

A friend of mine has to do a lot of typing for school in her IT classes. That means, she has to learn how to type fast on the keyboard. As lazy as she is, she asked me if i have any idea how she's able to type her texts on https://at4.typewriter.at/index.php?r=site/index without actually doing something. I thought to myself "hey thats a cool idea, I'll look into it".

This is how the website looks like

Thats the website where she has to type. There is a <span id="actualLetter" tag with the current char that has to be typed and another <span id="remainingText" with the remaining text. I've been able scrape the fist "actualLetter" with BeautifulSoup and open the website with webbrowser. The problem is, that on first start the span "remainingText" does not have 100% of the remaining Text. After the first letter has been typen, the span updates to the "full" text and I could scrape it. After I'd scrape it, I'd just let it be written by the python program with pynput.keyboard.

The problem I am facing is that i have no Idea how to scrape data from a website that already has been opened in a webrowser / that already has been edited / that already has been interacted with. I'm happy about any advice or solutions!

Thanks!


Solution

  • Normally, you'd have people asking for what you've tried so far and your code, but I understand you're really in the dark on how to even get started with this problem.

    If you need the Python script to be able to step in after the user has interacted with the site, you're in for a massive challenge. There are many variables, like what browser is being used, on what operating system, at what resolution, with what settings, etc.

    Interacting with a live application will be fairly hard, although not impossible. If the site can be operated entirely using the keyboard and you can find some reliable sequence of keyboard inputs that find the right controls to send input to, that could be an approach and libraries like pywin32 could provide access to the API call you'd need to send input to the screen.

    However, a better approach may be to just cut out the user altogether and have the script perform all the interaction. You can do that through something like selenium and a driver like ChromeDriver that basically allows you to operate a website, with all its scripting, like a user would.

    You should probably look into either of these approaches and come up with a basic attempt to ask more specific questions if you run into problems.