Search code examples
gogoquery

Split element on line break with GoQuery


I'm trying to get content from page with GoQuery, but for some reasons I can't do split on line break (br).

The HTML, looks like this:

<ul>
    <li>I'm skipped</li>

    <li> 
        Text Into  - <p>Whatever</p>
        <p>
            Line 1<br />
            Line 2<br />
            Line 3<br />
            Line 4<br />
            Line N
        </p>
    </li> 
</ul>

Go code:

doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
    panic(err)
}

doc.Find("ul").Each(func(i int, s *goquery.Selection) {

    str := s.Find("li p").Next().Text()

    fmt.Println(str, "--")

})

For some reason I'm not able to get each line, separated by break in p tag, as single item.Output of code above is:

Line1Line2Line3Line4LineN--

But the output I'm trying to achieve should looks like this:

Line1--
Line2--
Line3--
Line4--
LineN--

Since I'm Go newbie, please let me know in comment If something is not clear, so I will try to explain It as much as I know.

Thanks.


Solution

  • .Text() will:

    Text gets the combined text contents of each element in the set of matched elements, including their descendants.

    So what you actually want to do is get the contents and the filter out any br tags. As dave's answer states there is new line characters in there so I've also trimmed those:

    package main
    
    import (
        "fmt"
        "github.com/PuerkitoBio/goquery"
        "strings"
    )
    
    var input string = `
    <ul>
        <li>I'm skipped</li>
    
        <li> 
            Text Into  - <p>Whatever</p>
            <p>
                Line 1<br />
                Line 2<br />
                Line 3<br />
                Line 4<br />
                Line N
            </p>
        </li> 
    </ul>
    `
    
    func main() {
        doc, err := goquery.NewDocumentFromReader(strings.NewReader(input))
        if err != nil {
            panic(err)
        }
    
        doc.Find("ul").Each(func(i int, s *goquery.Selection) {
    
            p := s.Find("li p").Next()
            p.Contents().Each(func(i int, s *goquery.Selection) {
                if !s.Is("br") {
                    fmt.Println(strings.TrimSpace(s.Text()), "--")
                }
    
            })
    
        })
    }
    

    Produces:

    Line 1 --
    Line 2 --
    Line 3 --
    Line 4 --
    Line N --