Search code examples

How to convert HTML table to array with golang

I'm having a problem trying to convert an HTML table into a Golang array. I've tried to achieve it using x/net/html and goquery, without any success on both of them.

Let's say we have this HTML table:

        <td>Row 1, Content 1</td>
        <td>Row 1, Content 2</td>
        <td>Row 1, Content 3</td>
        <td>Row 1, Content 4</td>
        <td>Row 2, Content 1</td>
        <td>Row 2, Content 2</td>
        <td>Row 2, Content 3</td>
        <td>Row 2, Content 4</td>

And I'd like to end up with this array:

|Row 1, Content 1| Row 1, Content 2|
|Row 2, Content 1| Row 2, Content 2|

As you guy can see, I'm just ignoring Contents 3 and 4.

My extraction code:

func extractValue(content []byte) {
  doc, _ := goquery.NewDocumentFromReader(bytes.NewReader(content))

  doc.Find("table tr td").Each(func(i int, td *goquery.Selection) {
    // ...

I've tried to add a controller number which would be responsible for ignoring the <td> that I don't want to convert and calling


but with no luck. Do you guys have any idea of what should I do to accomplish it?



  • You can get away with package only.

    var body = strings.NewReader(`                                                                                                                            
            <td>Row 1, Content 1</td>                                                                                                                          
            <td>Row 1, Content 2</td>                                                                                                                          
            <td>Row 1, Content 3</td>                                                                                                                          
            <td>Row 1, Content 4</td>                                                                                                                          
            <td>Row 2, Content 1</td>                                                                                                        
            <td>Row 2, Content 2</td>                                                                                                                          
            <td>Row 2, Content 3</td>                                                                                                                          
            <td>Row 2, Content 4</td>                                                                                                                          
    func main() {
        z := html.NewTokenizer(body)
        content := []string{}
        // While have not hit the </html> tag
        for z.Token().Data != "html" {
            tt := z.Next()
            if tt == html.StartTagToken {
                t := z.Token()
                if t.Data == "td" {
                    inner := z.Next()
                    if inner == html.TextToken {
                        text := (string)(z.Text())
                        t := strings.TrimSpace(text)
                        content = append(content, t)
        // Print to check the slice's content

    This code is written only for this typical HTML pattern only, but refactoring it to be more general wouldn't be hard.