Search code examples
regexpcre

Is it possible to replace characters only within selected a group?


Consider: word1.word2.worda-wordb-wordc.ext

Is there a regular expression capture and replace string that can be used to obtain as result: worda wordb wordc using perl compatible regular expressions?

I know you can capture the group of words a,b,c with: /.+?\..+?\.(.+?)\.ext$/$1/, but I don't know how to additionally replace dash (-) characters by space ( ) characters only within that group.


you can assume that:

  • word groups are separated by period .
  • words within a group are separated by dash -
  • words are made up of alphanumeric characters [A-Za-z0-9]

looking for a one line /regex/replace/, not a script.


Solution

  • You can use this regex search-replace:

    (?:^(?:[^.\n]+\.)*?(?=[^.\n]+\.ext$)|(?!^)\G)([^\n-]+)-(?:([^\n.-]+)\..+)?
    

    And replace it with:

    $1 $2
    

    Output:

    worda wordb wordc
    

    RegEx Demo

    RegEx Details:

    • (?:: Start non-capture group
      • ^: Start
      • (?:[^.\n]+\.)*?: Match 1+ non-dot strings followed by a dot. Repeat this group 0 or more times (lazy match)
      • (?=[^.\n]+\.ext$): Must be followed by a <anything-but-dot.ext> and end position
      • |: OR
      • (?!^)\G: Start the next match from end of the previous match
    • ): End non-capture group
    • (: Start capture group #1
      • [^\n-]+: Match and capture a non-hyphen substring in capture group #1
    • ): End capture group #1
    • -: Match a -
    • (?:: Start non-capture group
      • ([^\n.-]+): Match and capture non-dot, non-hyphen substring in capture group #2
      • \.: Match a dot
      • .+: Followed by extension
    • )?: End non-capture group, ? makes this an optional match

    PS: You can remove \n everywhere from above regex when matching a single line input.