Search code examples
goscanfgofmt

Go equivalent of C's negated scansets


What is the way to mimick the negated scansets that exist in C?

For an example input string: aaaa, bbbb

In go using:

fmt.Sscanf(input, "%s, %s", &str1, &str2)

The result is only str1 being set as: aaaa,

In C one could use a format string as "%[^,], %s" to avoid this problem, is there a way to accomplish this in go?


Solution

  • Go doesn't support this directly like C, partially because you should be reading a line and using something like strings.FieldsFunc. But that's naturally a very simplistic view. For data formatted in a homogeneous manner, you could use bufio.Scanner to essentially do the same thing with any io.Reader. However, if you had to deal with something like this format:

    // Name; email@domain
    //
    // Anything other than ';' is valid for name.
    // Anything before '@' is valid for email.
    // For domain, only A-Z, a-z, and 0-9, as well as '-' and '.' are valid.
    sscanf("%[^;]; %[^@]@%[-." ALNUM "]", name, email, domain);
    

    then you'd run into trouble because you're now dealing with a particular state. In such a case, you might prefer working with bufio.Reader to manually parse things. There's also the option of implementing fmt.Scanner. Here's some sample code to give you an idea of how easy it can be to implement fmt.Scanner:

    // Scanset acts as a filter when scanning strings.
    // The zero value of a Scanset will discard all non-whitespace characters.
    type Scanset struct {
        ps        *string
        delimFunc func(rune) bool
    }
    
    // Create a new Scanset to filter delimiter characters.
    // Once f(delimChar) returns false, scanning will end.
    // If s is nil, characters for which f(delimChar) returns true are discarded.
    // If f is nil, !unicode.IsSpace(delimChar) is used
    // (i.e. read until unicode.IsSpace(delimChar) returns true).
    func NewScanset(s *string, f func(r rune) bool) *Scanset {
        return &Scanset{
            ps:        s,
            delimFunc: f,
        }
    }
    
    // Scan implements the fmt.Scanner interface for the Scanset type.
    func (s *Scanset) Scan(state fmt.ScanState, verb rune) error {
        if verb != 'v' && verb != 's' {
            return errors.New("scansets only work with %v and %s verbs")
        }
        tok, err := state.Token(false, s.delimFunc)
        if err != nil {
            return err
        }
        if s.ps != nil {
            *s.ps = string(tok)
        }
        return nil
    }
    

    Playground example

    It's not C's scansets, but it's close enough. As mentioned, you should be validating your data anyway, even with formatted input, because formatting lacks context (and adding it while dealing with formatting violates the KISS principle and worsens the readability of your code).

    For example, a short regex like [A-Za-z]([A-Za-z0-9-]?.)[A-Za-z0-9] isn't enough to validate a domain name, and a simplistic scanset would simply be the equivalent of [A-Za-z0-9.-]. The scanset, however, would be enough to scan the string from a file or whatever other reader you might be using, but it wouldn't be enough to validate the string alone. For that, a regex or even a proper library would be a much better option.