I wanted to scroll down web page using selenium. Found this: How can I scroll a web page using selenium webdriver in python?
Took this code as shown here:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
It works fine. But I have found some issue in my main code because of code above. I want to parse twitter. If twitter account is long, in html code of web page there are a few twits. Not all twits of this account.
Example: I scroll down web page, and in html code of web page contains only those twits which are visible for me (which I can see). Due to this thing i can't catch all the twits. This code above scrolls page quickly. How can I slow down scrolling?
I tried to solve it and wrote dumb code:
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
# Scroll down to bottom
y = 600
finished = False
while True:
for timer in range(0, 100):
driver.execute_script("window.scrollTo(0, " + str(y) + ")")
y += 600
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
print(new_height, last_height)
if new_height == last_height: #on the first iteration new_height equals last_height
print('stop')
finished = True
break
last_height = new_height
if finished:
break
This code doesn't work. On the first iteration new_height equals to last_height Please, help me.
If you can fix my code, fix it. If you can write another elegant solution, write it please.
UPD:
This scrolling has to be infinity. For example: i scroll down facebook account 'till i scroll it fully. That's why i have last_height and new_height variables. In my code when last_height equals to new_height that's mean page has been scrolled to the end and we can stop scrolling it(we can exit). But i missed something. My code doesn't work.
I have worked on the Twitter bot, when you scroll down it updates the page's HTML and removes some tweets from above. The algorithm I used is:
current_height = DriverWrapper.cd.execute_script("return document.body.scrollHeight")
new_height == current_height
end otherwise repeat from 2nd step..