Search code examples
regexnotepad++

Ignore specific character in Npp regex


I'm working with the Notepad++ flavour of regex. This...

Find: ([^`]{1,23} )
Replace: $0\n

...takes this input string...

Now is the time for all good men to come to the aid of the party.

...and produces this output string:

Now is the time for all

good men to come to the

aid of the party.

It splits the string into lines of 24 or fewer non-backtick (`) chars, splitting after spaces. It only works if the final character of the input string is also a space character.

This string...

Now is the time for all good men to █come to the aid█ of the party.

...splits differently.

Now is the time for all

good men to █come to

the aid█ of the party.

I'm looking for a way to skip over the █ characters - to process the input string as if the █s were not there.

[Notes: ` (backtick) characters are reserved to enclose text formatting tags, to be inserted later. █ characters will be used to mean "this bit of text will have tags inserted later", so they will be clensed, but not yet. I'm using █ (full block) to represent the Unicode 7F (del) character here, because 7F doesn't display properly. I can also use Perl flavour regex in AHK, if absolutely necessary.]

These regex patterns finds fail to ignore █:

(([^`]|█?){1,23} )
((([^`])|(█)?){1,23} )
((([^`])|(?:█)){1,23} )

So, is there a way to do this?


Solution

  • You may use the following pattern:

    (?:[^`█]█*){1,23}[ ]
    

    This matches any character except for a backtick or a full block followed by zero or more full block characters and allows the whole thing to be repeated between 1 and 23 times. This ensures that the full block characters are not counted toward the {1,23} quantifier.

    Demo.

    You may also use Unicode codepoints (which looks better, in my opinion):

    (?:[^`\x{2588}]\x{2588}*){1,23}[ ]
    

    Moreover, if the final character (of the last match) doesn't have to be a space character, you may use:

    (?:[^`\x{2588}]\x{2588}*){1,23}(?: |$)