Search code examples
regexperlnegative-lookbehind

Perl regexp - replace all digits in a string with # unless they have a certain prefix


I am trying to replace all numbers in a string with '#', provided that they don't have a specific prefix. The numbers may appear as part of a word, or as a word on their own.

For example, using ABC as the prefix, this is the desired result.

Input:

sdkfjsd 12312981 sdkfjsdfhbnmawd 1298 ,smdfsdnfk2342423 
sdlkfsdfs 20349 ABC1203912 2034234aac <-- ABC<number> stays, the other numbers do not
ABC1203912

Result (note that lines 2,3 have ABC with a number):

sdkfjsd # sdkfjsdfhbnmawd # ,smdfsdnfk#
sdlkfsdfs # ABC1203912 #aac <-- ABC<number> stays, the other numbers do not
ABC1203912

I tried to do it with a negative-look behind regexp: s/(?<!ABC)\d+/#/g. In this case only the first digit after ABC will not be replaced, the rest will.

My next step would be to split the string into parts that contain ABC\d+ , and perform a simple replace on the other parts.

Will appreciate any advice how to do the whole thing without splitting into multiple strings.

Thanks!

Edit 1: moved aac back to proper position. Edit 2: I am using perl 5.8.5, in case this is relevant. I can't update to a newer version due to compatibility issues with code that I don't control.


Solution

  • I don't understand what you mean by "My next step would be to split the string into parts that contain ABC\d+, and perform a simple replace on the other parts.", but it looks like it is not your main issue here. Do let me know otherwise.

    To match every digit that is not preceded by the keyword ABC, then you can use this regex:

    (?<!ABC|\d)\d+
    

    This prevents the matching of a digit if there is ABC before it, or another digit (thus preventing \d+ to match if starting from the middle of a digit.

    regex101 demo

    Note that you had two parts of your string in your question moved around. I'm taking only the input that you used.


    If the above doesn't work (e.g. the regex engine says the pattern in the lookbehind cannot be of variable width, or something along these lines), then the alternate equivalent is:

    (?<!ABC)(?<!\d)\d+
    

    regex101 demo