Search code examples
pythonweb-scrapingpython-requestsurllib

Python: how to scrape an AJAX website with requests


So i found an XHR call that contains the data that i want to scrape from a website. I read from multiple stack overflow posts that after finding the AJAX call you just need to make a simple request and get the data out of it. I tried doing so with this script:

import requests

url = 'https://prematch.planetwin365.it/api/mobileapp/v1.0/events/subevents/7/1?eventIds=7882&marketId=1' 
resp = requests.get(url) # tried switching to requests.post(url) without results
print(resp.text)

But the code just hang there and never terminates with an output. I also tried switching to urllib module but nothing changed.

I'll include in the post the parameters of the XHR call:

enter image description here

enter image description here


Solution

  • After some time i eventually found the way to make the script work, i just needed to add the cookies and their values to the request.

    import requests
    
    url = 'https://prematch.planetwin365.it/api/mobileapp/v1.0/events/subevents/7/1?eventIds=7882&marketId=1'
    cookies = {
        '_abck': '9E74104BD81F74BC1F7FE4F350725782~-1~YAAQBn16XKY4SGV/AQAAiyhkcAeyPf4pbo56aNcfqyMn8VMbBaeceOur9oNhuYnoPqJLsjnAoocCbDKkgZFOEWoiwlfNk6V6Udj4P/70hBovYJi2DKCMZB9HBGinWe9zrW2SsT3DbYtXQsRmavMP2j6peUAaz0szRjJ7DL5TR/ykRMDUw6tIT5Shdnt92hpyD1ZAIiUaIK/zDF3uovwlL3fEVdCUDLAm31PLdD0jpoyykElqiq5Hoa+PLmUUDO3jpnksaOcufWNi3xpJY20Kv0381tW/ItRynw7Eb6HFArdBaQFwlAM02at0sE4aUQVP96LO8RM0yEZVxPMYqf0MjZO+QxmRzganJc0ZSCiosKQJxTPf3Yqc7Zty0NIcUE+Ehg==~-1~-1~-1',
        'ak_bmsc': '718C62489D4C3B70E59118960293A120~000000000000000000000000000000~YAAQBn16XKc4SGV/AQAAjChkcA/rBuRQOYpDBHKR8qBQYKUQ8qss5cHy1xWge/7l0CMUrerZbMcXsARKXQWby6r5KhTnRZdhUs9Y7ye2L41kULV0/AUk0scTAlFloaKEGCLQr0+JJX1s6xsl0NhwwoVRheIFGXNec3OGa1sKpIxDbmxdVDA13UE2ro0+FomMrb4Rha/KrchRq8SB6X891og4rS7BZAiNN99bTDjfXpadkOyx6BOP/K5YzWFGIFySyYO2rl5GM6+6yZZgYWFwf/QuwNulNjBJcJwL18R1JoXJhdsEsohMaxcWRA+k+JEH6KBf8tWzXnxD/W7Zziu1tYWlqgKLXn2YSY4xZRpgkW/3y/Nk3h3/SIWKkZW7QqcyRmEVcw==',
        'bm_sz': 'E2F27C0E90271F31FED908CFCF988112~YAAQBn16XKg4SGV/AQAAjChkcA9A84+G6l+xysXyS5accePwqGAtL3XIjfqODlfHfczD3DNN2/8D2d88HJd6tAtS54VEo96LvEgGNsRsnUJdGygSKLRreqzkoc68h7eO7UtqkX6EAJUM4haPybJHwTgl5u63yMusZB7JrDeBa9IVW32Ie671N7b0RIE9SWEuYO6DD7AVj6gv5mX3/GNHfygXiIPqXpcUyRiJkRf+/Uu5ryWYTGkAGsWChI8liZC4afObCk9lT2vhIZlvdagFPXJ2ADbVD3Mk8EFionnfFX3qWxiPMe+UqA==~4470852~4274480'
    }
    
    def main():
        response = requests.get(url, cookies=cookies) 
        return response.text
    
    if __name__ == '__main__':
        print(main())