Search code examples
goconcurrencygoroutine

Trouble figuring out data race in goroutine


I started learning go recently and I've been chipping away at this for a while now, but figured it was time to ask for some specific help. I have my program requesting paginated data from an api and because there are about 160 pages of data. Seems like a good use of goroutines, except I have race conditions and I can't seem to figure out why. It's probably because I'm new to the language, but my impressions was that params for a function are passed as a copy of the data in the function calling it unless it's a pointer.

According to what I think I know this should be making copies of my data which leaves me free to change it in the main function, but I end up request some pages multiple times and other pages just once.

My main.go

package main

import (
    "bufio"
    "encoding/json"
    "log"
    "net/http"
    "net/url"
    "os"
    "strconv"
    "sync"

    "github.com/joho/godotenv"
)

func main() {
    err := godotenv.Load()
    if err != nil {
        log.Fatalln(err)
    }

    httpClient := &http.Client{}
    baseURL := "https://api.data.gov/ed/collegescorecard/v1/schools.json"

    filters := make(map[string]string)
    page := 0
    filters["school.degrees_awarded.predominant"] = "2,3"
    filters["fields"] = "id,school.name,school.city,2018.student.size,2017.student.size,2017.earnings.3_yrs_after_completion.overall_count_over_poverty_line,2016.repayment.3_yr_repayment.overall"
    filters["api_key"] = os.Getenv("API_KEY")

    outFile, err := os.Create("./out.txt")
    if err != nil {
        log.Fatalln(err)
    }
    writer := bufio.NewWriter(outFile)

    requestURL := getRequestURL(baseURL, filters)

    response := requestData(requestURL, httpClient)

    wg := sync.WaitGroup{}
    for (page+1)*response.Metadata.ResultsPerPage < response.Metadata.TotalResults {
        page++
        filters["page"] = strconv.Itoa(page)

        wg.Add(1)
        go func() {
            defer wg.Done()

            requestURL := getRequestURL(baseURL, filters)

            response := requestData(requestURL, httpClient)

            _, err = writer.WriteString(response.TextOutput())
            if err != nil {
                log.Fatalln(err)
            }
        }()

    }

    wg.Wait()

}

func getRequestURL(baseURL string, filters map[string]string) *url.URL {
    requestURL, err := url.Parse(baseURL)
    if err != nil {
        log.Fatalln(err)
    }

    query := requestURL.Query()
    for key, value := range filters {
        query.Set(key, value)
    }
    requestURL.RawQuery = query.Encode()

    return requestURL
}

func requestData(url *url.URL, httpClient *http.Client) CollegeScoreCardResponseDTO {
    request, _ := http.NewRequest(http.MethodGet, url.String(), nil)

    resp, err := httpClient.Do(request)
    if err != nil {
        log.Fatalln(err)
    }
    defer resp.Body.Close()

    var parsedResponse CollegeScoreCardResponseDTO
    err = json.NewDecoder(resp.Body).Decode(&parsedResponse)
    if err != nil {
        log.Fatalln(err)
    }

    return parsedResponse
}

I know another issue I will be running into is writing to the output file in the correct order, but I believe using channels to tell each routine what request finished writing could solve that. If I'm incorrect on that I would appreciate any advice on how to approach that as well.

Thanks in advance.


Solution

  • goroutines do not receive copies of data. When the compiler detects that a variable "escapes" the current function, it allocates that variable on the heap. In this case, filters is one such variable. When the goroutine starts, the filters it accesses is the same map as the main thread. Since you keep modifying filters in the main thread without locking, there is no guarantee of what the goroutine sees.

    I suggest you keep filters read-only, create a new map in the goroutine by copying all items from the filters, and add the "page" in the goroutine. You have to be careful to pass a copy of the page as well:

    go func(page int) {
       flt:=make(map[string]string)
       for k,v:=range filters {
         flt[k]=v
       }
       flt["page"]=strconv.Itoa(page)
       ...
    } (page)