Search code examples
unixreplacetclstr-replaceregsub

Replacing a specific part


I have a list like this:

DEL075MD1BWP30P140LVT
AN2D4BWP30P140LVT
INVD0P7BWP40P140
IND2D6BWP30P140LVT

I want to replace everything in between D and BWP with a *

How can I do that in unix and tcl


Solution

    • Do you have the whole list available at the same time, or are you getting one item at a time from somewhere?
    • Should all D-BWP groups be processed, or just one per item?
    • If just one per item, should it be the first or last (those are the easiest alternatives)?

    Tcl REs don't have any lookbehind, which would have been nice here. But you can do without both lookbehinds and lookaheads if you capture the goalpost and paste them into the replacement as back references. The regular expression for the text between the goalposts should be [^DB]+, i.e. one or more of any text that doesn't include D or B (to make sure the match doesn't escape the goalposts and stick to other Ds or Bs in the text). So: {(D)[^DB]+(BWP)} (braces around the RE is usually a good idea).

    If you have the whole list and want to process all groups, try this:

    set result [regsub -all {(D)[^DB]+(BWP)} $lines {\1*\2}]
    

    (If you can only work with one line at a time, it's basically the same, you just use a variable for a single line instead of a variable for the whole list. In the following examples, I use lmap to generate individual lines, which means I need to have the whole list anyway; this is just an example.)

    Process just the first group in each line:

    set result [lmap line $lines {
        regsub {(D)[^DB]+(BWP)} $line {\1*\2}
    }]
    

    Process just the last group in each line:

    set result [lmap line $lines {
        regsub {(D)[^DB]+(BWP[^D]*)$} $line {\1*\2}
    }]
    

    The {(D)[^DB]+(BWP[^D]*)$} RE extends the right goalpost to ensure that there is no D (and hence possibly a new group) anywhere between the goalpost and the end of the string.

    Documentation: lmap (for Tcl 8.5), lmap, regsub, set, Syntax of Tcl regular expressions