Search code examples
pythonindexoutofrangeexception

Index out of range when sending requests in a loop


I encounter an index out of range error when I try to get the number of contributors of a GitHub project in a loop. After some iterations (which are working perfectly) it just throws that exception. I have no clue why ...

    for x in range(100):
        r = requests.get('https://github.com/tipsy/profile-summary-for-github')  
        xpath = '//span[contains(@class, "num") and following-sibling::text()[normalize-space()="contributors"]]/text()'
        contributors_number = int(html.fromstring(r.text).xpath(xpath)[0].strip().replace(',', ''))
        print(contributors_number) # prints the correct number until the exception

Here's the exception.

----> 4     contributors_number = int(html.fromstring(r.text).xpath(xpath)[0].strip().replace(',', ''))
IndexError: list index out of range

Solution

  • It seems likely that you're getting a 429 - Too many requests since you're firing requests of one after the other.

    You might want to modify your code as such:

    import time
    
    for index in range(100):
        r = requests.get('https://github.com/tipsy/profile-summary-for-github')  
        xpath = '//span[contains(@class, "num") and following-sibling::text()[normalize-space()="contributors"]]/text()'
        contributors_number = int(html.fromstring(r.text).xpath(xpath)[0].strip().replace(',', ''))
        print(contributors_number)
        time.sleep(3) # Wait a bit before firing of another request
    

    Better yet would be:

    import time
    
    for index in range(100):
        r = requests.get('https://github.com/tipsy/profile-summary-for-github')
        if r.status_code in [200]:  # Check if the request was successful  
            xpath = '//span[contains(@class, "num") and following-sibling::text()[normalize-space()="contributors"]]/text()'
            contributors_number = int(html.fromstring(r.text).xpath(xpath)[0].strip().replace(',', ''))
            print(contributors_number)
        else:
            print("Failed fetching page, status code: " + str(r.status_code))
        time.sleep(3) # Wait a bit before firing of another request