A Question of Greedy vs. Negated Character Classes in Regex

I have a very large file that looks like this (see below). I have two basic choices of regex to use on it (I know there may be others but I'm really trying to compare Greedy and Negated Char Class) methods.

ftp: [^\D]{1,}
ftp: (\d)+
ftp: \d+

Note: what if I took off the parense around the \d?

Now + is greedy which forces backtracking but the Negated Char Class require a char-by-char comparison. Which is more efficient? Assume the file is very-very large so minute differences in processor usage will become exaggerated due to the length of the file.

Now that you've answered that, What if my Negated Char Class was very large, say 18 different characters? Would that change your answer?

Thanks.

ftp: 1117 bytes
ftp: 5696 bytes
ftp: 3207 bytes
ftp: 5696 bytes
ftp: 7200 bytes

Solution

Both your expressions have the same greediness. As others have said here, except for the capturing group they will execute in the same way.

Also in this case greediness won't matter much at the execution speed since you don't have anything following \d*. In this case the expression will simply process all the digits it can find and stop when the space is encountered. No backtracking should occur with these expressions.

To make it more explicit, backtracking should occur if you have an expression like this:

\d*123

In this case the parser will engulf all the digits, then backtrack to match the three following digits.