Search code examples
stringregex-negation

Replace all except multiple matches to keep


I have a string like:

"This is some text.|Some more text.|Some other text.|Some different text."

What I want to achieve is to select all text which is not matched by these conditions:

  1. From the beginning of the line select the first n characters.
  2. After each | select the first n characters.

If I set n=10 I should get this selection:

"me text.|text.| text.|rent text."

Final target is to replace the selected text with nothing, so that the left over is:

"This is so, Some more , Some other, Some diffe"

I already came so far to select the text I want to keep, but unfortunately I need to replace the not needed text.

I hope this is doable.

This is the current state of regex I have:

^(.{20})|(\|)(.{20})

It gives me the text itself (including the | ) and I haven´t managed to adopt what I found so far on my problem.


Solution

  • You might use a capture group and a negated character class excluding matching a pipe.

    ([^\n|]{10})[^\n|]*(?:\||$)
    

    The pattern matches:

    • ( Capture group 1
      • [^\n|]{10} Repeat 10 times matching a char other than a newline or |
    • ) Close group 1
    • [^\n|]* Match optional chars other than a newline or |
    • (?:\||$) Either match | or the end of the string

    Regex demo

    In the replacement use group 1, but note that if you are doing a single replacement there will be a trailing comma and a space that you would have after process and remove.

    $1,

    Output

    This is so, Some more , Some other, Some diffe, 
    

    Another idea is to split on |, then loop the splitted results taking the first 10 characters and join back the results with ,