Can someone explain what MATLAB is doing with nul bytes (x00
) in regular expressions?
Examples:
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
4 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
>> regexp(char([0 0 0 0 10 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
[] % expected
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
The answer might simply be, MATLAB regular expression isn't meant to handle non printable characters, but I would assume it would error if this was the case.
EDIT: The 46 is expected to be '.'
as in the regex wildcard.
EDIT2:
>> regexp(char([0 0 0 0 50 0 0 100 0 0 90 0 0 0]),char([0 0 46 0 0 90]))
ans =
1 9
I realized it could have been 10 being a special character so this one has only printable and nul bytes. I would expect this one to only match 9 because the fifth character 50
does not match 0
.
this bug is probably already fixed. I tested your example from Matlab Central in several versions:
in R2013b:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2015a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2016a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
[]