Search code examples
goweb-scraping

Golang url.Parse always return Invalid Control Character URL


I'm trying to scrape some site using proxy that i get from free-proxy-list.net and apply it in my local http request using Golang, but when i parse the proxy using url.Parse() always return Invalid Control Character URL

func getProxy() *url.URL {
    proxyUrl := "https://www.proxy-list.download/api/v1/get?type=http&country=US"
    client := &http.Client{}
    req, err := http.NewRequest("GET", proxyUrl, nil)
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println("Error proxy ", err)
    }
    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Error response body", err)
    }
    details := string(body)
    temp := strings.Split(details, "\n")
    fmt.Println("http://" + temp[rand.Intn(30)])
    checkProxy, err := url.Parse("http://" + temp[rand.Intn(10)])
    if err != nil {
        fmt.Println("Bad proxy URL", err)
    }

    return checkProxy
}

Solution

  • proxyUrl := "https://www.proxy-list.download/api/v1/get?type=http&country=US"
    

    The content of this URL are lines in the format ip:port\r\n´, i.e. the line delimiter is\r\n`, (DOS/Windows style).

    temp := strings.Split(details, "\n")
    

    This splits the content by \n, i.e. the UNIX style line delimiter. This leaves the \r from the DOS line delimiter in the string, resulting in ip:port\r.

    ... always return Invalid Control Character URL

    It is the remaining \r in the line it is complaining about.