Search code examples
gogithubamazon-ec2go-http

Github Your access to this site has been restricted in Go Http client


I'm running into an issue when using Go's http client to download a zip or tar.gz file from Github. I get a 403 with the message "Your access to this site has been restricted".

Curl works fine though.

I am running this in an EC2 instance on AWS in the us-west-2 region. In particular,

Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-0807918df10edc141 (64-bit x86) / ami-0c75fb2e6a6be38f6 (64-bit Arm)

Info

Sample code to reproduce:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    endpoint := "https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz"

    // or https://api.github.com/repos/kubeflow/manifests/zipball/v0.12.0

    // Get the data
    resp, err := http.Get(endpoint)
    if err != nil {
        fmt.Printf("[error] %v", err)
        return
    }
    defer resp.Body.Close()

    respData, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Printf("[error] %v", err)
        return
    }

    // Returns a 403 and html error page
    fmt.Printf("Resp:\n%v\n", string(respData))
}

Note: the above works fine on my local machine, it just seems to stop in the aws instance.

Thanks!


Solution

  • That particular error message means that GitHub is restricting you because you're making requests that match a pattern of abuse that's ongoing. GitHub is blocking this pattern because it causes availability concerns for other users.

    You should always make your program use a custom User-Agent header because that distinguishes your actions from other people's. (After all, lots of people use Go.) You should acquire the URLs you're using via the API, not via github.com directly. You should also authenticate when possible (e.g., with a token), because GitHub will give authenticated requests higher limits, and if you cause a problem, GitHub can reach out to you. Finally, you should implement appropriate rate-limiting and throttling so that you don't make too many requests and back off or stop completely if you get a 403, 429, or 5xx error.

    If you need to download many archives for the same repository, clone it and use git archive, which is far more efficient. Caching data instead of requesting it multiple times is also recommended.

    If you do all of these things, you'll probably find that your requests work.