I want to know the idiomatic way to solve this (which currently throws a deadlock error), the recursion branches an unknown number of times, so I cannot simply close the channel.
http://play.golang.org/p/avLf_sQJj_
I have made it work, by passing a pointer to a number, and incrementing it, and I've looked into using Sync waitgroups. I didn't feel (and I may be wrong), that I'd came up with an elegant solution. The Go examples I have seen tend to be simple, clever and concise.
This is the last exercise from a Tour of Go, https://tour.golang.org/#73
Do you know 'how a Go programmer' would manage this? Any help would be appreciated. I'm trying to learn well from the start.
Instead of involving sync.WaitGroup
, you could extend the result being send on a parsed url and include number of new URLs found. In your main loop you would then keep reading the results as long as there's something to collect.
In your case number of urls found would be number of go routines spawned, but it doesn't necessarily need to be. I would personally spawn more or less fixed number of fetching routines, so you don't open too many HTTP requests (or at least you have control over it). Your main loop wouldn't change then, as it doesn't care how the fetching is being executed. The important fact here is that you need to send either a result or error for each url – I've modified the code here, so it doesn't spawn new routines when the depth is already 1.
A side effect of this solution is that you can easily print the progress in your main loop.
Here is the example on playground:
http://play.golang.org/p/BRlUc6bojf
package main
import (
"fmt"
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
type Res struct {
url string
body string
found int // Number of new urls found
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, ch chan Res, errs chan error, visited map[string]bool) {
body, urls, err := fetcher.Fetch(url)
visited[url] = true
if err != nil {
errs <- err
return
}
newUrls := 0
if depth > 1 {
for _, u := range urls {
if !visited[u] {
newUrls++
go Crawl(u, depth-1, fetcher, ch, errs, visited)
}
}
}
// Send the result along with number of urls to be fetched
ch <- Res{url, body, newUrls}
return
}
func main() {
ch := make(chan Res)
errs := make(chan error)
visited := map[string]bool{}
go Crawl("http://golang.org/", 4, fetcher, ch, errs, visited)
tocollect := 1
for n := 0; n < tocollect; n++ {
select {
case s := <-ch:
fmt.Printf("found: %s %q\n", s.url, s.body)
tocollect += s.found
case e := <-errs:
fmt.Println(e)
}
}
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := f[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
And yes, follow @jimt advice and make access to the map thread safe.