Search code examples
concurrencygogoroutine

Should we do nested goroutines?


I'm trying to build a parser for a large number of files, and I can't find information about what might possibly be called "nested goroutines" (maybe this is not the right name ?).

Given a lot of files, each of them having a lot of lines. Should I do:

for file in folder:
    go do1

def do1:
    for line in file:
        go do2

def do2:
    do_something

Or should I use only "one level" of goroutines, and do the following:

for file in folder:
    for line in file:
        go do_something

My question target primarily performance issues.

Thanks for reaching that sentence !


Solution

  • If you go through with the architecture you've specified, you have a good chance of running out of CPU/Mem/etc because you're going to be creating an arbitrary amount of workers. I suggest, instead go with an architecture that allows you to throttle via channels. For example:

    In your main process feed the files into a channel:

    for _, file := range folder {
      fileChan <- file
    }
    

    then in another goroutine break the files into lines and feed those into a channel:

    for {
      select{
      case file := <-fileChan
        for _, line := range file {
          lineChan <- line
        }
      }
    }
    

    then in a 3rd goroutine pop out the lines and do what you will with them:

    for {
      select{
      case line := <-lineChan:
        // process the line
      }
    }
    

    The main advantage to this is that you can create as many or as few go routines as your system can handle and pass them all the same channels and whichever go routine gets to the channel first will just handle it, so you're able to throttle the amount of resources you're using.

    Here is a working example: http://play.golang.org/p/-Qjd0sTtyP