Search code examples
web-scrapingpython-requestsgetxmlhttprequest

How to recover a hidden ID from a query string from an XHR GET request?


I'm trying to use the hidden airbnb api. I need to reverse engineer where the ID comes from in the query string of a GET request. For example, take this listing:

https://www.airbnb.ca/rooms/47452643

The "public" ID is shown to be 47452643. However, another ID is needed to use the API.

If you look at the XHR requests in Chrome, you'll see a request starting with " StaysPdpSections?operationName". This is the request I want to replicate. If I copy the request in Insomnia or Postman, I see a variable in the query string starting with:

"variables":"{"id":"U3RheUxpc3Rpbmc6NDc0NTI2NDM="

The hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" is what I need. It is needed to get the data from this request and must be inserted into the query string. How can I recover the hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" for each listing dynamically?


Solution

  • That target id is burried really deep in the html....

    import requests
    from bs4 import BeautifulSoup as bs
    import json
    
    url = 'https://www.airbnb.ca/rooms/47452643'
    req = requests.get(url)
    
    soup = bs(req.content, 'html.parser')
    script = soup.select_one('script[type="application/json"][id="data-state"]')
    data = json.loads(script.text)
    
    target  = data.get('niobeMinimalClientData')[2][1]['variables']
    print(target.get('id'))
    

    Output:

    U3RheUxpc3Rpbmc6NDc0NTI2NDM=