I am trying to scrape tweets under a hashtag using Python selinum and I use the following code to scroll down
driver.execute_script('window.scrollTo(0,document.body.scrollHeight);')
The problem is that selinum only scrapes shown tweets (only 3 tweets) and then scroll down to the end of the page and load more tweets and scrape 3 new tweets missing a lot of tweets in between.
Is there a way to show all tweets and then scroll down and show all new tweets or at least some new tweets (I've a mechasm to filter already scraped rweets) ?
Note I'm running my script on GCP VM so I can't rotate the screen.
I think that I can make the script keeps pressing the down arrow by that I can display tweets one by one and scrape them and also keep loading more tweets, but I think that this will slow down the scraper so much.
Scroll down the page by pixels, so the page will get the time to load the data, try the below code:
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollBy(0, 800);") # you can increase or decrease the scrolling height, i.e - '800'
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height