I would like to do web scraping on this site (stackoverflow.com
), I was wondering if there was an API
or some other tool that can be used with Python to get all the comments containing a specific tag.
For example, how do I get all the posts and comments from 10/01/2019 to 01/20/2019
with the python
tag?
Have a detailed look at https://api.stackexchange.com/docs/
You can get all questions from a start date to an end date with a particular tag by making use of the questions method. You need to pass the specific tag into the tagged
parameter.
Here is the URL format for that:
https://api.stackexchange.com/2.2/questions?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&tagged={tag}&site=stackoverflow
For example the below link returns all questions from 1st July, 2019 to 5th July, 2019 with tag python
:
https://api.stackexchange.com/2.2/questions?fromdate=1561939200&todate=1562284800&order=desc&sort=activity&tagged=python&site=stackoverflow
For more information on how the date has been formatted in the above URL, you can have a look at dates.
Now that you have the question_id
, you can make use of questions/{ids}/answers method to get all answers of that question from a start date to an end date.
Here is the URL format for that:
https://api.stackexchange.com/2.2/questions/{question_id}/answers?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&site=stackoverflow
For example the below link returns all answers from 1st January, 2019 to 1st July, 2019 to question with question_id 37181281:
https://api.stackexchange.com/2.2/questions/37181281/answers?fromdate=1546300800&todate=1561939200&order=desc&sort=activity&site=stackoverflow
Now you basically have all the posts(questions and answers) from a start date to an end date with a particular tag.
Since, you have the question_id
and answer_id
for the posts, you can make use of questions/{ids}/comments method and answers/{ids}/comments method to get the comments on these posts.