Search code examples
csvgo

cannot parse csv file start with unexpected character


I'm tring to parse a csv file, but it's head start with character ZWNBSP which make my code fail, how can i parse this csv as expected?

csv content

ZWNBSP"AccountId","AccountNumber","AccountName","CustomerId","CustomerName","CurrencyCode","Spend","Impressions","Clicks","Installs","VideoViews","Conversions","Sales","TimePeriod"

code trial

package main

import (
    "encoding/csv"
    "fmt"
    "os"
    "strings"
)


func main() {
    file, err := os.Open("/Desktop/193657270154964993.csv")
    if err != nil {
        fmt.Println("Error:", err)
        return
    }
    defer file.Close()

    reader := csv.NewReader(file)
    for {
        record, err := reader.ReadAll()
        if err != nil {
            break 
        }

        fmt.Println(record)
    }
}

error info

bare " in non-quoted-field


Solution

  • I recommend using the Golang's own x/text package: it has a Transformer type and BOM-aware encodings that can handle removing (and inserting) a BOM with just one extra line of code.

    I created this small sample CSV which has a UTF-8BOM:

    A,B
    1,x
    2,y
    3,z
    

    If I run this small program:

    func main() {
        var r io.Reader
    
        r, _ = os.Open("input.csv")
    
        csvr := csv.NewReader(r)
        records, _ := csvr.ReadAll()
        fmt.Printf("%q\n", records)
    }
    

    it prints, with the BOM before "A":

    [["\ufeffA" "B"] ["1" "x"] ["2" "y"] ["3" "z"]]
    

    If I modify that program and a add a Transformer which specifically decodes UTF-8BOM encoded bytes (to just UTF-8):

    import (
        ...
        "golang.org/x/text/encoding/unicode"
        "golang.org/x/text/transform"
    )
    
    func main() {
        var r io.Reader
    
        r, _ = os.Open("input.csv")
        r = transform.NewReader(r, unicode.UTF8BOM.NewDecoder())
    
        csvr := csv.NewReader(r)
        records, _ := csvr.ReadAll()
        fmt.Printf("%q\n", records)
    }
    

    it prints:

    [["A" "B"] ["1" "x"] ["2" "y"] ["3" "z"]]
    

    I chose to generically declare r as io.Reader so I could use the same variable and take that transformer-line in and out, for a compact example. You could also write something more explicit and idiomatic, like:

    fIn, _ := os.Open("input.csv")
    defer fIn.Close()
    bomDecoder := transform.NewReader(fIn, unicode.UTF8BOM.NewDecoder())
    csvReader := csv.NewReader(bomDecoder)
    

    If you need to re-encode with a BOM when you're done processing the CSV, create a writer with a BOM encoder:

    fOut, _ := os.Create("output.csv")
    defer fOut.Close()
    
    t := transform.NewWriter(fOut, unicode.UTF8BOM.NewEncoder())
    
    csvw := csv.NewWriter(t)
    csvw.WriteAll(records)
    
    % hexdump -C output.csv
    00000000  ef bb bf 41 2c 42 0a 31  2c 78 0a 32 2c 79 0a 33  |...A,B.1,x.2,y.3|
    00000010  2c 7a 0a                                          |,z.|
    00000013