i'm currently trying to work with the html tokenizer https://godoc.org/golang.org/x/net/html.
So what i want to do is following: get all links from url and if url contains a certain string -> add to url-list.
resp, err = client.Get("someurl")
var urls []string
if err != nil {
log.Fatal(err)
}
z := html.NewTokenizer(resp.Body)
for {
tt := z.Next()
switch {
case tt == html.ErrorToken:
return
case tt == html.StartTagToken:
t := z.Token()
isAnchor := t.Data == "a"
if !isAnchor {
continue
}
ok, url := getHref(t)
if !ok {
continue
}
if strings.Contains(url, "somestring") {
urls = append(urls, url)
}
}
}
fmt.Println(urls)
This doesn't work since "fmt.Println(urls)" is unreachable. The loop ofc ends at some point.... but this doesn't compile. How do i get the code after the loop to be reachable?
Regards
There's no break
in the loop. The only way it ends is via a return
which sends control out of this function. This means that fmt.Println(urls)
is not reachable.
Try this:
L:
for {
tt := z.Next()
switch {
case tt == html.ErrorToken:
break L
case tt == html.StartTagToken:
t := z.Token()
isAnchor := t.Data == "a"
if !isAnchor {
continue
}
ok, url := getHref(t)
if !ok {
continue
}
if strings.Contains(url, "somestring") {
urls = append(urls, url)
}
}
}