Search code examples
regexgocapture-group

How to use capture groups in GoLang?


I need to get Golang logs using regex but the output of the regex is incorrect for the "msg" capture group. I have this function here to extract log statements from the contents of a file:

func extractLogStatements(content string) []LogStatement {
    logPattern := `flog\.(?P<sev>.*?)\(\s*flog.(?P<type>.*?),\s*("|fmt.Sprintf\(")(?P<msg>.*?)"`

    re := regexp.MustCompile(logPattern)
    matches := re.FindAllStringSubmatch(content, -1)

    logStatements := make([]LogStatement, 0, len(matches))
    for _, match := range matches {
        statement := LogStatement{
            Sev:  match[1],
            Type: match[2],
            Msg:  match[3],
        }
        logStatements = append(logStatements, statement)
    }

    return logStatements
}

Everything works correctly except the regex pattern on the first line in the function is not capturing the correct values for the capture groups, even though when I tested on an online regex parser it worked fine.

Here are some examples of the logs I've been testing on:

flog.Info(flog.Application, fmt.Sprintf("unable to translate address for config: %v", err))

flog.Info(flog.Application, "unable to translate address for config")

flog.Info(flog.Application, fmt.Sprintf("Test 1"),
    lm.CrKind, objectType,
    lm.CrName, crName,
    lm.AppNS, namespace)

For the first log example, it should extract "Info" ("sev" capture group), "Application" ("type" capture group), and "unable to translate address for config: %v" ("msg" capture group). When I output to json I get:

[
    {
        "sev": "Info",
        "type": "Application",
        "msg": "fmt.Sprintf(\""
    },
    {
        "sev": "Info",
        "type": "Application",
        "msg": "fmt.Sprintf(\""
    },
    {
        "sev": "Info",
        "type": "Application",
        "msg": "fmt.Sprintf(\""
    },
]

So it's capturing the "sev" and "type" capture groups correctly but for the "msg" it's capturing "fmt.Sprintf("" when it should be getting "unable to translate address for config: %v".


Solution

  • match[3] stores the value for the group ("|fmt.Sprintf\("). If you don't want to capture it, use ?: to turn it into a non-capturing group.

    (?:"|fmt.Sprintf\(")
    

    Since all the values you want are captured by named capture groups, another solution is to reference them by name:

    for _, match := range matches {
        statement := LogStatement{
            Sev:  match[re.SubexpIndex("sev")],
            Type: match[re.SubexpIndex("type")],
            Msg:  match[re.SubexpIndex("msg")],
        }
        logStatements = append(logStatements, statement)
    }