Search code examples
gorecursionchannelgoroutine

Count Tree Leaves Concurrently


There is a function that I want to write using a concurrency model in case the input is too large and parallel processing would be more efficient but it never ends.

Assuming there is a struct defined as:

type Tree struct {
    Name     string   `json:"name"`
    SubTrees []*Tree  `json:"subTrees,omitempty"`
    Leaves   []string `json:"leaves"`
}

I want to write a function that calculates the total number of Leaves throughout the entire recursive structure. This is easily done with recursion with:

func (tree *Tree) CountLeaves() int {
    curr := len(tree.Leaves)
    for _, s := range tree.SubTrees {
        curr += s.CountLeaves()
    }
    return curr
}

That's nice and all, but if the structure becomes too large, this is going to be inefficient so I wanted to refactor it to be concurrent and use channels. Here is my attempt at the refactor:

func (tree *Tree) CountLeaves() int {
    var wg sync.WaitGroup
    ch := make(chan int)
    defer close(ch)
    go count(tree, true, ch, &wg)

    var total int
    wg.Add(1)
    go func(total *int) {
        for x := range ch {
            fmt.Println(x)
            *total += x
        }
        wg.Done()
    }(&total)
    wg.Wait()

    return total
}

func count(t *Tree, root bool, ch chan int, wg *sync.WaitGroup) {
    defer wg.Done()
    ch <- len(t.Leaves)
    if t.SubTrees != nil {
        wg.Add(len(t.SubTrees))
        for _, s := range t.SubTrees {
            go count(s, false, ch, wg)
        }
        wg.Wait()
    }

    if root {
        ch <- -1
    }
}

I am able to currently gather all numbers through the channel that I would need to currently calculate the total number of Leaves but the function never ends. The terminating value -1 from the root Tree struct is never pushed or received through the channel and I can't figure out why.

Any ideas?


Solution

  • I'm pretty sure your WaitGroup is just never getting enough wg.Done calls:

    go func(total *int) {
        for x := range ch {
            fmt.Println(x)
            *total += x
        }
        wg.Done()
    }(&total)
    

    Since you never close ch, the wg.Done will never be called here. I think if you move it inside the loop:

    go func(total *int) {
        for x := range ch {
            fmt.Println(x)
            *total += x
             wg.Done()
        }
    }(&total)
    

    That will resolve the issue.

    EDIT:

    Actually, I think there is one more issue:

    defer wg.Done()
    ch <- len(t.Leaves)
    if t.SubTrees != nil {
        wg.Add(len(t.SubTrees))
        for _, s := range t.SubTrees {
            go count(s, false, ch, wg)
        }
        wg.Wait()
    }
    

    The defered wg.Done() won't get called until you return, so this wg.Wait() will also wait forever. This hsould probable be:

    ch <- len(t.Leaves)
    if t.SubTrees != nil {
        wg.Add(len(t.SubTrees))
        for _, s := range t.SubTrees {
            go count(s, false, ch, wg)
        }
        wg.Done()
        wg.Wait()
    } else {
        wg.Done()
    }