Search code examples
regexnotepad++

Regular expression to find and replace full matches (consecutive repeats, preserve delimiter)


It's necessary that when searching, complete matches of words/phrases are found and replaced with new ones:

  • If the value is blank, then delete.
  • If the value is not blank, then replace it with a word/phrase while maintaining the separator.

Original Text:

1:
Coconut
2:
ACoconut
3:
Coconut!
4:
!CoconutA

5:
Coconut,Orange,Coconut,Pear,Coconut
6:
ACoconut,Orange,Coconut!,Pear,!CoconutA

7:
Coconut,Coconut
8:
ACoconut,Coconut!
9:
Coconut,Coconut,Coconut
10:
ACoconut,Coconut!,!CoconutA

11:
Coconut,Coconut,Orange
12:
ACoconut,!Coconut,Orange
13:
Coconut,Coconut,Coconut,Orange
14:
Coconut,ACoconut,!Coconut,Orange
15:
Orange,Coconut,Coconut,Pear
16:
Orange,Coconut,CoconutA,Pear

17:
Orange,Coconut,Coconut,Coconut,Pear
18:
Orange,ACoconut,Coconut,!Coconut,Pear
19:
Pear,Coconut,Coconut
20:
Pear,Coconut!,Coconut
21:
Pear,!Coconut,Coconut,ACoconut
22:
Pear,CoconutA,!Coconut,Coconut

Expected Results:

Nuances:

  1. I use Notepad++ for search and replace.
  2. If there are extra characters before and/or after a word/phrase, this is not a complete match (only commas are ignored).
  3. If a word/phrase is repeated sequentially through a delimiter (in this case, a comma), then this sequence is replaced by one word/phrase. It doesn’t matter whether this sequence starts from the very beginning, in the middle or from the end of the text.
  4. There must be a complete match of the word/phrase (for this, my expression uses ^, comma delimiter and $). I don't have to specify the separator myself when replacing text, this should be done using part of the expression (capturing groups).
  • Replace with: - means blank, not space

    1:
    
    2:
    ACoconut
    3:
    Coconut!
    4:
    !CoconutA
    
    5:
    Orange,Pear
    6:
    ACoconut,Orange,Coconut!,Pear,!CoconutA
    
    7:
    
    8:
    ACoconut,Coconut!
    9:
    
    10:
    ACoconut,Coconut!,!CoconutA
    
    11:
    Orange
    12:
    ACoconut,!Coconut,Orange
    13:
    Orange
    14:
    ACoconut,!Coconut,Orange
    15:
    Orange,Pear
    16:
    Orange,CoconutA,Pear
    
    17:
    Orange,Pear
    18:
    Orange,ACoconut,!Coconut,Pear
    19:
    Pear
    20:
    Pear,Coconut!
    21:
    Pear,!Coconut,ACoconut
    22:
    Pear,CoconutA,!Coconut
    
  • Replace with: Tomato

    1:
    Tomato
    2:
    ACoconut
    3:
    Coconut!
    4:
    !CoconutA
    
    5:
    Tomato,Orange,Tomato,Pear,Tomato
    6:
    ACoconut,Orange,Coconut!,Pear,!CoconutA
    
    7:
    Tomato
    8:
    ACoconut,Coconut!
    9:
    Tomato
    10:
    ACoconut,Coconut!,!CoconutA
    
    11:
    Tomato,Orange
    12:
    ACoconut,!Coconut,Orange
    13:
    Tomato,Orange
    14:
    Tomato,ACoconut,!Coconut,Orange
    15:
    Orange,Tomato,Pear
    16:
    Orange,Tomato,CoconutA,Pear
    
    17:
    Orange,Tomato,Pear
    18:
    Orange,ACoconut,Tomato,!Coconut,Pear
    19:
    Pear,Tomato
    20:
    Pear,Coconut!,Tomato
    21:
    Pear,!Coconut,Tomato,ACoconut
    22:
    Pear,CoconutA,!Coconut,Tomato
    

My Try (with Results):

  • Find what: ^(Coconut)$|^(Coconut)(,)|(,)(Coconut)(,)|(,)(Coconut)$
    1. Replace with: $4$7$3$6

      1:
      
      2:
      ACoconut
      3:
      Coconut!
      4:
      !CoconutA
      
      5:
      ,Orange,,Pear,
      6:
      ACoconut,Orange,Coconut!,Pear,!CoconutA
      
      7:
      ,Coconut
      8:
      ACoconut,Coconut!
      9:
      ,Coconut,
      10:
      ACoconut,Coconut!,!CoconutA
      
      11:
      ,Coconut,Orange
      12:
      ACoconut,!Coconut,Orange
      13:
      ,Coconut,,Orange
      14:
      ,ACoconut,!Coconut,Orange
      15:
      Orange,,Coconut,Pear
      16:
      Orange,,CoconutA,Pear
      
      17:
      Orange,,Coconut,,Pear
      18:
      Orange,ACoconut,,!Coconut,Pear
      19:
      Pear,,Coconut
      20:
      Pear,Coconut!,
      21:
      Pear,!Coconut,,ACoconut
      22:
      Pear,CoconutA,!Coconut,
      
    2. Replace with: $4$7Tomato$3$6

      1:
      Tomato
      2:
      ACoconut
      3:
      Coconut!
      4:
      !CoconutA
      
      5:
      Tomato,Orange,Tomato,Pear,Tomato
      6:
      ACoconut,Orange,Coconut!,Pear,!CoconutA
      
      7:
      Tomato,Coconut
      8:
      ACoconut,Coconut!
      9:
      Tomato,Coconut,Tomato
      10:
      ACoconut,Coconut!,!CoconutA
      
      11:
      Tomato,Coconut,Orange
      12:
      ACoconut,!Coconut,Orange
      13:
      Tomato,Coconut,Tomato,Orange
      14:
      Tomato,ACoconut,!Coconut,Orange
      15:
      Orange,Tomato,Coconut,Pear
      16:
      Orange,Tomato,CoconutA,Pear
      
      17:
      Orange,Tomato,Coconut,Tomato,Pear
      18:
      Orange,ACoconut,Tomato,!Coconut,Pear
      19:
      Pear,Tomato,Coconut
      20:
      Pear,Coconut!,Tomato
      21:
      Pear,!Coconut,Tomato,ACoconut
      22:
      Pear,CoconutA,!Coconut,Tomato
      

Solution

    • Ctrl+H
    • Find what: ^Coconut(?:,|$)|(?:^|,)Coconut(?:,Coconut)*$|(,)Coconut(?:,Coconut)*(?:,|$)
    • Replace with: (?1$1)
    • TICK Match case
    • TICK Wrap around
    • SELECT Regular expression
    • UNTICK . matches newline
    • Replace all

    Explanation:

      ^               # beginning of line
        Coconut         # literally
        (?:,|$)         # a comma or end of line
    |               # OR
        (?:^|,)         # beginning of line or comma
        Coconut         # literally
        (?:,Coconut)*   # a comma followed by Coconut, may appear 0 or more times
      $               # end of line
    |               # OR
        (,)             # group 1, a comma
        Coconut         # literally
        (?:,Coconut)*   # a comma followed by Coconut, may appear 0 or more times
        (?:,|$)         # comma or end of line
    

    Replacement:

    (?1$1)          # if group 1 exists, print it
    

    Screenshot (before):

    enter image description here

    Screenshot (after):

    enter image description here


    Replace with Tomato:

    • Find what: (^|,)Coconut(?:,Coconut)*(,|$)
    • Replace with: $1Tomato$2