Search code examples
go

How a scanner can be implemented with a custom split


I have a log file, and I need to parse each record in it using golang. Each record begin with "#", and a record can span one or more lines :

# Line1
# Line2
Continued line2
Continued line2
# line3
.....

Some code :), I'm a beginner

   f, _ := os.Open(mylog)
    scanner := bufio.NewScanner(f)
    var queryRec string

    for scanner.Scan() {
            line := scanner.Text()

            if strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
                    queryRec = line
            } else if !strings.HasPrefix(line, "# ") && len(queryRec) == 0 {
                    fmt.Println("There is a big problem!!!")
            } else if !strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
                    queryRec += line
            } else if strings.HasPrefix(line, "# ") && len(queryRec) != 0 {
                    queryRec = line
            }
    }

Thanks,


Solution

  • The Scanner type has a function called Split which allows you to pass a SplitFunc to determine how the scanner will split the given byte slice. The default SplitFunc is the ScanLines which you can see the implementation source. From this point you can write your own SplitFunc to break the bufio.Reader content based on your specific format.

    func crunchSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
    
        // Return nothing if at end of file and no data passed
        if atEOF && len(data) == 0 {
            return 0, nil, nil
        }
    
        // Find the index of the input of a newline followed by a 
        // pound sign.
        if i := strings.Index(string(data), "\n#"); i >= 0 {
            return i + 1, data[0:i], nil
        }
    
        // If at end of file with data return the data
        if atEOF {
            return len(data), data, nil
        }
    
        return
    }
    

    You can see the full implementation of the example at https://play.golang.org/p/ecCYkTzme4. The documentation provides all the insight needed to implement something like this.