The goal of my little macro is to find the 1000th row, and append a nonsense string to it, so that I can then transpose the data and add commas between the row values.
I found a search term for the replace function:((.+?\r\n){1000}) and I replace with: ZZZZZ
When I have less than 2500 rows, everything works fine. If I got over that number (it is approximate), I get an error saying: The complexity of matching the expression has exceeded available resources. Google has given me about 3 results with this specific problem, one of which was on stackoverflow, but it seems to focus on a very different topic. Topic: (Different results for unicode/multibyte modifier and mb_ereg_replace)
Could someone please tell me why I am getting this error and how to fix it, or a different way to append "ZZZZZ" to every 1000th row in my data set?
A marking/capturing group within a marking/capturing group results in an undefined behavior which is nearly always not the expected result.
A correct Perl regular expression search string would be: ^((?:.+?\r\n){1000})
The inner group is now a non marking group because of ?:
after opening parenthesis. The inner group is defined only to be able to apply the multiplier expression and therefore should not mark something, i.e. copy found string on stack for re-use via a back reference.
Note for the future:
A multiplier like ?
, +
, *
, {n}
, {n,}
, {n,m}
applied on a marking group is always wrong.
It is also important on using .*
(any character except newline characters 0 or more times) or .+
(any character except newline characters 1 or more times) to give the Perl regular expression engine an anchor where to start and where to end matching characters. End for matching characters is defined by \r\n
. But the beginning for matching characters is not defined in your search expression. That's the reason why I added ^
... begin of a line. I have often seen unexpected find/replace results on using .*
or .+
without specifying in search string where matching characters should start and end.
This search expression matches 1000 lines completely with carriage return and line feed and using $1
or \1
could be used to back reference this block and insert on next line the string ZZZZZ
.
But ZZZZZ
should be inserted at end of each 1000th line and not at beginning of the next line.
For that reason this search expression is required: ^((?:.*?\r\n){999}.*)$
The replace string is \1ZZZZZ
or $1ZZZZZ
The search string starting each search at beginning of a line - very important here - matches 999 lines with 0 or more characters in each line and 0 or more characters greedy on 1000th line up to but not including the newline characters carriage return and line feed. $
works also for end of file with Perl regular expression engine. Therefore this Perl regular expression search string works also for example for a file with exactly 5000 lines whereby the last line has no line termination.
Why is ^
here very important to get the right result?
After inserting ZZZZZ
at end of each 1000th line, the current position is at end of the 1000th line immediately before carriage return and line feed of this line. Without ^
the search would start with matching \r\n
(.*
... 0 or more) of the current 1000th line and not at beginning of the next line below.