I started learning go recently and I've been chipping away at this for a while now, but figured it was time to ask for some specific help. I have my program requesting paginated data from an api and because there are about 160 pages of data. Seems like a good use of goroutines, except I have race conditions and I can't seem to figure out why. It's probably because I'm new to the language, but my impressions was that params for a function are passed as a copy of the data in the function calling it unless it's a pointer.
According to what I think I know this should be making copies of my data which leaves me free to change it in the main function, but I end up request some pages multiple times and other pages just once.
My main.go
package main
import (
"bufio"
"encoding/json"
"log"
"net/http"
"net/url"
"os"
"strconv"
"sync"
"github.com/joho/godotenv"
)
func main() {
err := godotenv.Load()
if err != nil {
log.Fatalln(err)
}
httpClient := &http.Client{}
baseURL := "https://api.data.gov/ed/collegescorecard/v1/schools.json"
filters := make(map[string]string)
page := 0
filters["school.degrees_awarded.predominant"] = "2,3"
filters["fields"] = "id,school.name,school.city,2018.student.size,2017.student.size,2017.earnings.3_yrs_after_completion.overall_count_over_poverty_line,2016.repayment.3_yr_repayment.overall"
filters["api_key"] = os.Getenv("API_KEY")
outFile, err := os.Create("./out.txt")
if err != nil {
log.Fatalln(err)
}
writer := bufio.NewWriter(outFile)
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
wg := sync.WaitGroup{}
for (page+1)*response.Metadata.ResultsPerPage < response.Metadata.TotalResults {
page++
filters["page"] = strconv.Itoa(page)
wg.Add(1)
go func() {
defer wg.Done()
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
_, err = writer.WriteString(response.TextOutput())
if err != nil {
log.Fatalln(err)
}
}()
}
wg.Wait()
}
func getRequestURL(baseURL string, filters map[string]string) *url.URL {
requestURL, err := url.Parse(baseURL)
if err != nil {
log.Fatalln(err)
}
query := requestURL.Query()
for key, value := range filters {
query.Set(key, value)
}
requestURL.RawQuery = query.Encode()
return requestURL
}
func requestData(url *url.URL, httpClient *http.Client) CollegeScoreCardResponseDTO {
request, _ := http.NewRequest(http.MethodGet, url.String(), nil)
resp, err := httpClient.Do(request)
if err != nil {
log.Fatalln(err)
}
defer resp.Body.Close()
var parsedResponse CollegeScoreCardResponseDTO
err = json.NewDecoder(resp.Body).Decode(&parsedResponse)
if err != nil {
log.Fatalln(err)
}
return parsedResponse
}
I know another issue I will be running into is writing to the output file in the correct order, but I believe using channels to tell each routine what request finished writing could solve that. If I'm incorrect on that I would appreciate any advice on how to approach that as well.
Thanks in advance.
goroutines do not receive copies of data. When the compiler detects that a variable "escapes" the current function, it allocates that variable on the heap. In this case, filters
is one such variable. When the goroutine starts, the filters
it accesses is the same map as the main thread. Since you keep modifying filters
in the main thread without locking, there is no guarantee of what the goroutine sees.
I suggest you keep filters
read-only, create a new map in the goroutine by copying all items from the filters
, and add the "page"
in the goroutine. You have to be careful to pass a copy of the page
as well:
go func(page int) {
flt:=make(map[string]string)
for k,v:=range filters {
flt[k]=v
}
flt["page"]=strconv.Itoa(page)
...
} (page)