Webscraping data off real estate website in Python doesn't work due to url changing into something that can't be reloaded.
Hello,
I want to scrape property data off this real estate website in Australia for the purposes of understanding trends in property prices over time. I want do this in Python
I want comprehensive details for all properties. When i go to the second page of listings, the url changes:
However, when i try to use the second url in Python (using packages such as BeautifulSoup) it doesn't work. I've realised that the second site won't load if i try to put it into another tab either.
Any ideas on this? Both in terms of how i can code it in Python as well as what is unique about this website that is making it difficult for me.
Just to give you an idea - You are working with the wrong url and you have to perform a POST request to get your goal:
import requests
url = "https://www.raywhite.com/wp-admin/admin-ajax.php?action=dispatch&_p=rwcom/list/search/internal/true"
payload = "type=SAL&subtype=ANY&sort=&sortby=&status=CUR&baseUrl=&refpage=0&perpage=15&sortcombo=updated-DES---suburb&bedroom=any&bathroom=any&garage=any&event=&auctionmin=&auctionmax=&suburb=Homebush%2C+NSW+2140&radius=10&keywords=&agent=&office=&IF=&budgetmin=&budgetmax=&budgetdisplay=&landsizemin=&landsizemax=&floorsizemin=&floorsizemax=&comtype=&_s=buy&page=2"
headers = {
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'user-agent': 'some agent'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Check the DEV tools of your browser and look for the AJAX request that is performed if you are switching to the next page.