Search code examples
regexgopcre

How to select first chars with a custom word boundary?


I've test cases with a series of words like this :

    {
        input:    "Halley's Comet",
        expected: "HC",
    },
    {
        input:    "First In, First Out",
        expected: "FIFO",
    },
    {
        input:    "The Road _Not_ Taken",
        expected: "TRNT",
    },

I want with one regex to match all first letters of these words, avoid char: "_" to be matched as a first letter and count single quote in the word.
Currently, I have this regex working on pcre syntax but not with Go regexp package : (?<![a-zA-Z0-9'])([a-zA-Z0-9'])
I know lookarounds aren't supported by Go but I'm looking for a good way to do that.

I also use this func to get an array of all strings : re.FindAllString(s, -1)

Thanks for helping.


Solution

  • Something that plays with character classes and word boundaries should suffice:

    \b_*([a-z])[a-z]*(?:'s)?_*\b\W*
    

    demo

    Usage:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    func main() {
        re := regexp.MustCompile(`(?i)\b_*([a-z])[a-z]*(?:'s)?_*\b\W*`)
        fmt.Println(re.ReplaceAllString("O'Brian's dog", "$1"))
    
    }