Search code examples
goconcurrencychannelgoroutine

How to start a new goroutine each time a channel is updated


I am making a program that monitors different webpages, each time a new url is added to a page, I would like to start a new goroutine to scrape the new url. I am trying to simulate this like this:

package main

import (
    "fmt"
    "sync"
    "time"
)

func main() {
    var Wg sync.WaitGroup
    link := make(chan string)
    startList := []string{}
    go func() {
        for i := 0; i < 20; i++ {
            //should simulate the monitoring of the original web page
            nextLink := fmt.Sprintf("cool-website-%d", i)
            link <- nextLink
        }
    }()

    for i := 0; i < 20; i++ {
        newLink := <-link
        startList = append(startList, newLink)
        Wg.Add(1)
        go simulateScraping(i, startList[i])
        Wg.Done()
    }
    Wg.Wait()
}

func simulateScraping(i int, link string) {
    fmt.Printf("Simulating process %d\n", i)
    fmt.Printf("scraping www.%s.com\n", link)
    time.Sleep(time.Duration(30) * time.Second)
    fmt.Printf("Finished process %d\n", i)
}

This results in the following error fatal error: all goroutines are asleep - deadlock!. How do I only start the simulateScraping function each time that newLink is updated or when startList is appended to?

Thanks!


Solution

  • I see several problems with the code.

    1. Wait group is useless in the code because Wg.Done is called immediately and does not wait until the simulateScraping finishes, because it's running in parallel.

    To fix this, the closure function could be used

            go func(i int) {
                simulateScraping(i, newLink)
                Wg.Done()
            }(i)
    
    
    1. Instead of an increment loop, I would use for-each range loop. It allows code to be executed as soon as a new value get to a channel and automatically breaks when the channel closes.
        var i int
        for newLink := range link {
            Wg.Add(1)
            go func(i int) {
                simulateScraping(i, newLink)
                Wg.Done()
            }(i)
            i++
        }
        Wg.Wait()
    
    1. startList := []string{} Looks useless. Not sure how it was supposed to be used.

    2. Channel must be closed.

        go func() {
            for i := 0; i < 20; i++ {
                //should simulate the monitoring of the original web page
                nextLink := fmt.Sprintf("cool-website-%d", i)
                link <- nextLink
            }
           close(link) // Closing the channel
        }()
    

    The whole code

    package main
    
    import (
        "fmt"
        "sync"
        "time"
    )
    
    func main() {
        var Wg sync.WaitGroup
        link := make(chan string)
        go func() {
            for i := 0; i < 20; i++ {
                //should simulate the monitoring of the original web page
                nextLink := fmt.Sprintf("cool-website-%d", i)
                link <- nextLink
            }
            close(link)
        }()
    
        var i int
        for newLink := range link {
            Wg.Add(1)
            go func(i int) {
                simulateScraping(i, newLink)
                Wg.Done()
            }(i)
            i++
        }
        Wg.Wait()
    }
    
    func simulateScraping(i int, link string) {
        fmt.Printf("Simulating process %d\n", i)
        fmt.Printf("scraping www.%s.com\n", link)
        time.Sleep(3 * time.Second)
        fmt.Printf("Finished process %d\n", i)
    }
    
    

    Here is a good talk about "Concurrency Patterns In Go"