Search code examples
regexgo

golang regex excludes head symbols


There is some strings like:

.texta texti(
 .textb textj(
 textc textk(

Go playground:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    text := ".texta texta(, .textb textb(, textc textc("
    pattern := `[^.,]\s*(\w+)\s+(\w+)\s*\(`
    re := regexp.MustCompile(pattern)
    matches := re.FindAllStringSubmatch(text, -1)

    for _, match := range matches {
        if len(match) > 1 {
            fmt.Println(match[1])
        }
    }
}

Why the result has "exta", "extb"?

The target is to get "textc", excludes words started with "." or ","."

If the pattern is \s*(\w+)\s+(\w+)\s*\(, the result is "texta", "textb" and "textc"


Solution

  • The problem is that your regex is not "anchored" to any specific position while you expect \w+ to start matching at the beginning of a word. The [^,.] matches any character other than . and , and it can match a word character. So, you need to make sure the negated character class does not match word character, and you may want to also allow a match at the start of the string, you will need to add an alternative.

    You can use

    pattern := `(?:[^.,\w]|^)\s*(\w+)\s+(\w+)\s*\(`
    

    where (?:[^.,\w]|^) matches either a char other the ., , or word char, or a position at the start of the string.

    See the Go playground demo.