Search code examples
seleniumweb-scrapinginstagram-api

Downloading public data from Instagram for research


I am doing a research for which I am required to download Instagram data. At first I tried using Instagram API but it has a cap now on the number of posts that can be downloaded per API call and the number of API calls per day, which makes it irrelevant to my work. I also tried using instagram-scraper which is unable to download larger amounts of data. I finally turned to web scraping using selenium with python which worked well for me for scraping usernames of about 15000 public profiles relevant to my research. However, because of the dynamic way in which Instagram loads its web pages, I am unable to scrape links to posts of users. The code keeps pressing tabs and extracting post links(which are web pages which have only a single post) of focused elements. Instagram however, stops loading images(unable to scroll any further) after a certain number of posts or certain amount of time. Is there any other way I can do this?

I also wanted to inquire if this is legal and if I will be able to publish this data later on as most of the researchers do.

Can I buy this data somehow, if yes, then how much is it going to cost me and what are the sources?


Solution

  • I did something very similar to what you did so I thought maybe I can share some thoughts and answer some of your questions:

    1st: I'm pretty sure it's illegal (will try to add a link to Instagram's policy) and instagram strongly rejects crawling and scrapping of their properties. So buying this stuff is also out of question unless you want to get your hands dirty.

    2nd: Yes Instagram regularly changes the signature of their photos and videos. Thankfully the link to posts and profiles stays the same. The best you can do is to go to post webpage as fast as possible (before the signature expires) and download what you need.

    3rd: The link's signature comes from some JavaScript codes and if you download the webpage source you get nothing. You actually need a JS engine to parse and load webpage for you.

    4th: I'm not sure your post is considered a true Stack-overflow question. seems more like a guide to me than a question.

    And last I was not able to find any other method to load earlier posts beside the scrolling to bottom of page. You have to scroll and wait for more posts to fill the page, and it is pretty usual for Instagram to not load more posts so implement a timeout mechanism for yourself.