Search code examples
regexmatlabnul

nul bytes in regexp MATLAB


Can someone explain what MATLAB is doing with nul bytes (x00) in regular expressions?

Examples:

>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
      1  % current
      4  % expected

>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
      4  % current
      4  % expected

>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
      [] % current
      [] % expected

>> regexp(char([0 0 0 0 10 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
      1  % current
      [] % expected

>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
      [] % current
      [] % expected

The answer might simply be, MATLAB regular expression isn't meant to handle non printable characters, but I would assume it would error if this was the case.

EDIT: The 46 is expected to be '.' as in the regex wildcard.

EDIT2:

>> regexp(char([0 0 0 0 50 0 0 100 0 0 90 0 0 0]),char([0 0 46 0 0 90]))
ans =
     1    9

I realized it could have been 10 being a special character so this one has only printable and nul bytes. I would expect this one to only match 9 because the fifth character 50 does not match 0.


Solution

  • this bug is probably already fixed. I tested your example from Matlab Central in several versions:

    in R2013b:

    >> regexp(char([0 0 1 0  41 41 41 41 41 41]),char([0 '.' 0  40 40 40 40]))    
    ans =
    
         2
    

    in R2015a:

    >> regexp(char([0 0 1 0  41 41 41 41 41 41]),char([0 '.' 0  40 40 40 40]))   
    ans =  
    
         2
    

    in R2016a:

    >> regexp(char([0 0 1 0  41 41 41 41 41 41]),char([0 '.' 0  40 40 40 40]))
    ans = 
    
         []