Search code examples
regexsplittclinline

How to generate a list of repeating patterns from a string in TCL?


set s1 "dir1/dir2/some_word_g3_ger_another_word_g1_ger_TEMP2"

How to get this list {some_word_g3_ger_ another_word_g1_ger_} from s1 ?

I tried this :

regexp -inline -all {[^/]+_ger_} $s1

But it is failed to split :

some_word_g3_ger_another_word_g1_ger_


Solution

  • You need to make the match non-greedy, i.e. ensure that it ends as soon as it has found a minimal match, not when it has matched as much text as possible. This is done by using a +? quantifier (corresponding to the greedy + quantifier): in this case a non-capturing group ((?:...)) is also necessary.

    % regexp -inline -all {(?:[^/]+_ger_)+?} $s1
    some_word_g3_ger_ another_word_g1_ger_
    

    ETA:

    A regular expression is helpful here since it can deal with both skipping the unwanted text and chopping up the tokens. If it is practicable to remove the unwanted text in a first step, several other methods become at least as useful. For example:

    set s1 some_word_g3_ger_another_word_g1_ger_
    string map {_ger_ {_ger_ }} $s1
    

    (This results in the string "some_word_g3_ger_ another_word_g1_ger_ " with a trailing blank, but it is still functionally equivalent to the list of those two tokens.)

    Documentation: regexp, Syntax of Tcl regular expressions