Search code examples
pythongithubgithub-apirate-limiting

How to handle Github API limit by waiting?


I am trying to use GitHub API with an authorization token to retrieve some information about repositories. However, it hits the rate limit after some rerun. I want to handle this situation by waiting after getting the exception but even though I decrease the i when it throws the exception, it still gets the same exception and increases the i without appending any element.

For example, when it first gets the exception the i is 12 and the repositories list has 330 elements. Then it continues to run and when it throws the exception the second time, the i is 13 but the repositories still have 330 elements.

repositories = []
# There are 30 repos in every page, so with 33 iterations we get 990 java repositories.
for i in range(0, 33):
    try:
        url = "https://api.github.com/search/repositories?q=language:java&sort=forks&order=desc&page=" + str(i)
        response = requests.get(url=url, headers=headers).json()
        for repository in response["items"]:
            dt_string = datetime.now().strftime("%d/%m/%Y %H:%M:%S")
            repositories.append({"name": repository["full_name"], "link": repository["html_url"], "date": dt_string})
    except:
        print("exception")
        time.sleep(5)
        i = i-1
print("Repositories taken.")

Solution

  • Adding a while loop solved my problem and handled the exception properly. Also, sometimes API returns the same repositories and I control it by adding a new list of names and checking if the full_name is already stored.

    while len(repositories) < 1000:
        try:
            url = "https://api.github.com/search/repositories?q=language:java&sort=forks&order=desc&page=" + str(i)
            response = requests.get(url=url, headers=headers).json()
            for repository in response["items"]:
                item = repository["full_name"]
                if item not in names:
                    names.add(item)
                    dt_string = datetime.now().strftime("%d/%m/%Y %H:%M:%S")
                    repositories.append({"name": repository["full_name"], "link": repository["html_url"],
                                         "default_branch": repository["default_branch"],
                                         "stargazers_count": repository["stargazers_count"],
                                         "forks_count": repository["forks_count"], "date": dt_string})
            print(i)
            i = i + 1
        except:
            time.sleep(5)
            i = int(len(repositories) / 30)