I'm working with the Notepad++ flavour of regex. This...
Find: ([^`]{1,23} )
Replace: $0\n
...takes this input string...
Now is the time for all good men to come to the aid of the party.
...and produces this output string:
Now is the time for all
good men to come to the
aid of the party.
It splits the string into lines of 24 or fewer non-backtick (`) chars, splitting after spaces. It only works if the final character of the input string is also a space character.
This string...
Now is the time for all good men to █come to the aid█ of the party.
...splits differently.
Now is the time for all
good men to █come to
the aid█ of the party.
I'm looking for a way to skip over the █ characters - to process the input string as if the █s were not there.
[Notes: ` (backtick) characters are reserved to enclose text formatting tags, to be inserted later. █ characters will be used to mean "this bit of text will have tags inserted later", so they will be clensed, but not yet. I'm using █ (full block) to represent the Unicode 7F (del) character here, because 7F doesn't display properly. I can also use Perl flavour regex in AHK, if absolutely necessary.]
These regex patterns finds fail to ignore █:
(([^`]|█?){1,23} )
((([^`])|(█)?){1,23} )
((([^`])|(?:█)){1,23} )
So, is there a way to do this?
You may use the following pattern:
(?:[^`█]█*){1,23}[ ]
This matches any character except for a backtick or a full block followed by zero or more full block characters and allows the whole thing to be repeated between 1 and 23 times. This ensures that the full block characters are not counted toward the {1,23}
quantifier.
Demo.
You may also use Unicode codepoints (which looks better, in my opinion):
(?:[^`\x{2588}]\x{2588}*){1,23}[ ]
Moreover, if the final character (of the last match) doesn't have to be a space character, you may use:
(?:[^`\x{2588}]\x{2588}*){1,23}(?: |$)