For a project, I want to gather coauthorship data from researchgate.
I am completely new to webscraping, and got recommended scrapy for this project. I want to start scraping from this url (url = https://www.researchgate.net/scientific-contributions/Gregory-Phelan-2126234043), from which I would like to scrape the coauthors, after which I would like to scrape their coauthors, and so on, until I have formed a network.
I have been trying to fetch this url with Scrapy, using e.g. the fetch('url') command, and running Scrapy Shell 'url' in windows Powershell, but this returned the following:
After some research, I installed Docker and combined Scrapy and Splash. After doing this, I retried opening a Scrapy shell with the URL, but this time I ran (again in Powershell)
This first seemed to work, as the output changed to
However, after running response.css('title') to get the title, it returned
Part of the response.text output is also:
So to me, it seems that Scrapy somehow is unable to get to this link.
I also read about including a USER_AGENT in your shell start up, hence I first tried my own, and after this several randomly generated ones (using UserAgent()), but this did not change the outcome.
Does anyone have suggestions to succesfully fetch this link and start scraping?
I use python version 3.11.5, and scrapy version 2.11.0