Search code examples
pythonweb-scrapingweb

Scraping data off Real Estate Website (RayWhite)


Webscraping data off real estate website in Python doesn't work due to url changing into something that can't be reloaded.

Hello,

I want to scrape property data off this real estate website in Australia for the purposes of understanding trends in property prices over time. I want do this in Python

https://www.raywhite.com/buy/?type=SAL&_se=buy&budgetmax=&budgetdisplay=&subtype=ANY&suburb=Homebush%2C+NSW+2140&radius=0&radius=10&bedroom=any&bathroom=any&garage=any&budgetmin=&_s=buy

I want comprehensive details for all properties. When i go to the second page of listings, the url changes:

https://www.raywhite.com/buy/2/?type=SAL&_se=buy&budgetmax=&budgetdisplay=&subtype=ANY&suburb=Homebush%2C+NSW+2140&radius=0&radius=10&bedroom=any&bathroom=any&garage=any&budgetmin=&_s=buy

However, when i try to use the second url in Python (using packages such as BeautifulSoup) it doesn't work. I've realised that the second site won't load if i try to put it into another tab either.

Any ideas on this? Both in terms of how i can code it in Python as well as what is unique about this website that is making it difficult for me.


Solution

  • Just to give you an idea - You are working with the wrong url and you have to perform a POST request to get your goal:

    import requests
    
    url = "https://www.raywhite.com/wp-admin/admin-ajax.php?action=dispatch&_p=rwcom/list/search/internal/true"
    
    payload = "type=SAL&subtype=ANY&sort=&sortby=&status=CUR&baseUrl=&refpage=0&perpage=15&sortcombo=updated-DES---suburb&bedroom=any&bathroom=any&garage=any&event=&auctionmin=&auctionmax=&suburb=Homebush%2C+NSW+2140&radius=10&keywords=&agent=&office=&IF=&budgetmin=&budgetmax=&budgetdisplay=&landsizemin=&landsizemax=&floorsizemin=&floorsizemax=&comtype=&_s=buy&page=2"
    headers = {
        'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'user-agent': 'some agent'
    }
    
    response = requests.request("POST", url, headers=headers, data=payload)
    
    print(response.text)
    

    Check the DEV tools of your browser and look for the AJAX request that is performed if you are switching to the next page.