Search code examples
pythonseleniuminstagramscreen-scraping

Instagram web scraping with selenium Python problem


I have a problem with scraping all pictures from Instagram profile, I'm scrolling the page till bottom then find all "a" tags finally always I get only last 30 links to pictures. I think that driver doesn't see full content of page.

#scroll
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
    last_count = scrolldown
    time.sleep(2)
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    if last_count==scrolldown:
        match=True

#posts
posts = []
time.sleep(2)
links = driver.find_elements_by_tag_name('a')
time.sleep(2)
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
        posts.append(post)

Solution

  • Looks like you first scrolling to the page bottom and only then getting the links instead of getting the links and treating them inside the scrolling loop.
    So, if you want to get all the links you should perform the

    links = driver.find_elements_by_tag_name('a')
    time.sleep(2)
    for link in links:
        post = link.get_attribute('href')
        if '/p/' in post:
            posts.append(post)
    
    

    inside the scrolling, also before the first scrolling.
    Something like this:

    def get_links():
        time.sleep(2)
        links = driver.find_elements_by_tag_name('a')
        time.sleep(2)
        for link in links:
            post = link.get_attribute('href')
            if '/p/' in post:
                posts.add(post)
    
    posts = set()
    get_links()
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    match=False
    while(match==False):
        get_links()
        last_count = scrolldown
        time.sleep(2)
        scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
        if last_count==scrolldown:
            match=True