Search code examples
python-3.xseleniumxpathweb-scrapinginstagram

Followed IG scraping Tutorial and stuck on XPath/other issue


I've been working off this tutorial here: https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c

When I try to create a simpler version of function "insta_details", that would get the likes and comments of an Instagram photo post, I can't seem to tell what's gone wrong with the code. I think I'm using the xpaths wrongly (first time), but the error message is calling for "NoSuchElementException".

from selenium.webdriver import Chrome


def insta_details(urls):
    browser = Chrome()
    post_details = []
    for link in urls:
        browser.get(link)
        likes = browser.find_element_by_partial_link_text('likes').text
        age = browser.find_element_by_css_selector('a time').text
        xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
        comment = browser.find_element_by_xpath(xpath_comment).text
        insta_link = link.replace('https://www.instagram.com/p', '')
        post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
    return post_details


urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)

Error Message:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}

Copying and pasting the code from the tutorial hasn't worked for me yet. Am I calling the function wrongly or is there something else in the code?


Solution

  • Looking at the tutorial it seems like your code is incomplete.

    Here, try this:

    import time
    import re
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver import Chrome
    
    
    def find_mentions_or_hashtags(comment, pattern):
        mentions = re.findall(pattern, comment)
        if (len(mentions) > 1) & (len(mentions) != 1):
            return mentions
        elif len(mentions) == 1:
            return mentions[0]
        else:
            return ""
    
    
    def insta_link_details(url):
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        browser = Chrome(options=chrome_options)
        browser.get(url)
        try:
            # This captures the standard like count.
            likes = browser.find_element_by_xpath(
                """/html/body/div[1]/section/main/div/div/article/
                    div[3]/section[2]/div/div/button/span""").text.split()[0]
            post_type = 'photo'
        except:
            # This captures the like count for videos which is stored
            likes = browser.find_element_by_xpath(
                """/html/body/div[1]/section/main/div/div/article/
                    div[3]/section[2]/div/span/span""").text.split()[0]
            post_type = 'video'
        age = browser.find_element_by_css_selector('a time').text
        comment = browser.find_element_by_xpath(
            """/html/body/div[1]/section/main/div/div[1]/article/
            div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text
    
        hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
        mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
        post_details = {'link': url, 'type': post_type, 'likes/views': likes,
                        'age': age, 'comment': comment, 'hashtags': hashtags,
                        'mentions': mentions}
        time.sleep(10)
        return post_details
    
    
    for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
        print(insta_link_details(url))
    

    Output:

    {'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\n🌿The Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements. 💪 Even at our busiest, there’s always a way to take care of our health. 💙\n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
    {'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on! 😘 Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty 💋 Find 5% rewards for purchases at:\n\n💙 Shaver Shop\n💙 Fresh Fragrances\n💙 Happy Hair Brush\n💕 & many more online at our website bio 👆!\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}