Search code examples
htmlgithubcommentsgithub-apigithub-api-v3

How to extract comments body from GitHub with Python


I would like to extract only the bodies of comments of the issues with the API v3 (but I am also open to new solutions).

For now I tried to enter this address: (example) https://api.github.com/repos/bitcoin/bitcoin/issues/comments?per_page=50&since=2018-02-19T00:00:00Z&until=2019-04-20T00:00 : 00Z

After entering the address I read all the data (HTML) but GitHub does not allow you to view more than 100 results at a time. Can this limit be exceeded? Can only the comment bodies be obtained?


Solution

  • GitHub's API offers pagination. You can't request more than 100 comments in a single request, but you can make multiple requests to retrieve more than 100 comments:

    Requests that return multiple items will be paginated to 30 items by default. You can specify further pages with the ?page parameter. For some resources, you can also set a custom page size up to 100 with the ?per_page parameter.

    The API also includes a Link HTTP header that tells you about interesting pages, e.g. what the next and last pages are.

    Can only the comment bodies be obtained?

    I'm not aware of any way to do this using the v3 / REST API. It may be possible using the v4 / GraphQL API, but please note that this uses a completely different model.