Search code examples
web-scrapingscrapyinstagram

How to Scrape All posts of hashtag in instagram


I want to Scrape All the Post Containing some #hashtag from Instagram

I tried it from : https://www.instagram.com/explore/tags/perfume/?__a=1

But It's only giving some posts not every post.


Solution

  • Look carefully at the json you receive.

    Navigate to graphql -> hashtag -> edge_hashtag_to_media -> page_info -> end_cursor

    That's the identifier you have to use to specify the next batch of medias, like this:

    https://www.instagram.com/explore/tags/perfume/?__a=1&max_id=QVFDNWJDZnpGbElpdEV5Q19aaldYWUsxZnc1YUd0Z21yNUZsOWw4V2NxX05ZWnZjT2pRb3lrY29ocDJnM0VNallUWGZVeDIxVURnUzltdHpBR1A1a0VRNw==

    You can iterate this process to get more medias for requested hashtag.

    A naive example with requests (python3) to extract first 10 batches.

    import requests
    import json
    from time import sleep
    
    max_id = ''
    
    base_url = "https://www.instagram.com/explore/tags/perfume/?__a=1"
    for i in range(0, 10):
        sleep(2) # Be polite.
    
        if max_id:
            url = base_url + f"&max_id={max_id}"
        else:
            url = base_url
    
        print(f"Requesting {url}")
        response = requests.get(url)
        response = json.loads(response.text)
        try:
            max_id = response['graphql']['hashtag']['edge_hashtag_to_media']['page_info']['end_cursor']
            print(f"New cursor is {max_id}")
        except KeyError:
            print("There's no next page!")
            break
    

    As said in comment, be polite. Instagram will block you if you shoot too many requests per second.