I'm writing a python script to generate how many changes was made within a timeframe for all projects, but when I use the Gerrit REST Api I can only get up to maximum of 500 unique users, and I want to see all of them, even if I take long timeframe (1 year Gerrit picture). This is my function for the API
def requestAPICall(url):
"""
does API stuff
"""
response = requests.get(url)
if response.status_code == 200:
JSON_response = json.loads(response.text[4:])
generateJSON(JSON_response)
return (JSON_response, True)
print("Error Occured")
return (response, False)
This is the link I used for the request in this case https://chromium-review.googlesource.com/changes/?q=since:%222022-01-01%2011:26:25%20%2B0100%22+before:%222023-01-01%2011:31:25%20%2B0100%22
I have tried curl commands but I do not know if that works
There is a default limit on the number of returned items, and if you're making anonymous queries I don't believe you can change this. From the documentation:
The query string must be provided by the q parameter. The n parameter can be used to limit the returned results. The no-limit parameter can be used remove the default limit on queries and return all results (does not apply to anonymous requests). This might not be supported by all index backends.
However, you can return paginated resulted using the start
parameter:
If the number of changes matching the query exceeds either the internal limit or a supplied n query parameter, the last change object has a
_more_changes: true
JSON field set.The
S
orstart
query parameter can be supplied to skip a number of changes from the list.
So if the final result sets _more_changes: true
, you can make a subsequent request using the start
parameter.
That means your Python code is going to look something like:
import json
import requests
import sys
class Gerrit:
"""Wrap up Gerrit API functionality in a simple class to make
it easier to consume from our code. This limited example only
supports the `changes` endpoint.
See https://gerrit-review.googlesource.com/Documentation/rest-api.html
for complete REST API documentation.
"""
def __init__(self, baseurl):
self.baseurl = baseurl
def changes(self, query, start=None, limit=None, options=None):
"""This implements the API described in [1].
[1]: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html
"""
params = {"q": query}
if start is not None:
params["S"] = start
if limit is not None:
params["n"] = limit
if options is not None:
params["o"] = options
res = requests.get(f"{self.baseurl}/changes", params=params)
print(f"fetched [{res.status_code}]: {res.url}", file=sys.stderr)
res.raise_for_status()
return json.loads(res.text[4:])
# And here is an example in which we use the Gerrit class to perform a
# query against https://chromium-review.googlesource.com. This is similar
# to the query in your question, but using a constrained date range in order
# to limit the total number of results.
g = Gerrit("https://chromium-review.googlesource.com")
all_results = []
start = 0
while True:
res = g.changes(
'since:"2022-12-31 00:00:00" before:"2023-01-01 00:00:00"',
limit=200,
start=start,
)
if not res:
break
all_results.extend(res)
if not res[-1].get("_more_changes"):
break
start += len(res)
# Here we're just dumping all the results as a JSON document on
# stdout.
print(json.dumps(all_results))
This demonstrates how to use limit
to control the number of queries returned in a "page", and the start
parameter to request additional pages of results.
But look out! The example query here includes only a couple days and returns over 3000 results; I suspect that any attempt to fetch a year's worth of data, particularly with an anonymous connection, are going to run into some sort of server rate limits.