Search code examples
phpbinarypreg-matchpreg-match-allhexdump

PHP - preg_match fail


Could someone tell me why the following preg_match search works:

preg_match("/\xF0\x49\xF7\xF8..\xF3\xF8/s", $bin, $matches2, PREG_OFFSET_CAPTURE);

while this doesnt give any result:

preg_match("/\x3F.\x0D\x01\x3E.\xF3\xFA..\x43\xFA.\x04\xFD\x02/s", $bin, $matches, PREG_OFFSET_CAPTURE);

Both possibilities are inside $bin.

Further question:

What is the best way to search the following positions, where XX are variables and could be anything (1 match or more) in the $bin file, at least i need the beginning position of each match.

I need to search for this:

3F XX 0D 01 3E XX F3 FA XX XX 43 FA XX 04 FD 02

Example matches:

**4 example matches**
1) 3F 64 0D 01 3E 64 F3 FA 86 F8 43 FA E1 04 FD 02 
2) 3F 5C 0D 01 3E 5C F3 FA 9C F8 43 FA B6 04 FD 02 
3) 3F 5B 0D 01 3E 5B F3 FA 9A F8 43 FA 69 04 FD 02 
4) 3F 6B 0D 01 3E 6B F3 FA 78 F8 43 FA 38 04 FD 02 

I can search in a $bin file where $bin contains raw binary, or convert it like bin2hex($bin), ..

I found out this way now, and it seems to be working, but, is this a "nice" and fast way to do this? I have more then 300MB ram already allocated in my script now, and want it make a little bit resource-friendlier.

preg_match_all("/3F[A-Z0-9]{2}0D013E[A-Z0-9]{2}F3FA[A-Z0-9]{4}43FA[A-Z0-9]{2}04FD02/", $binhex, $matches, PREG_OFFSET_CAPTURE);

Solution

  • Your latest regex was missing several spaces and the {4} group wasn't going to match your samples. Corrected, it looks like this: 3F [A-F\d]{2} 0D 01 3E [A-F\d]{2} F3 FA [A-F\d]{2} [A-F\d]{2} 43 FA [A-F\d]{2} 04 FD 02 This runs at 172steps which is nothing to be disappointed about.

    When selecting the right regex pattern for your project, it is best to identify your priorities across:

    1. Pattern Brevity & Readability - more of a concern if working in a team or updating frequently.

    2. Pattern Speed/Steps -- certainly a concern when working with big volumes of data.

    3. Pattern Validation Strength -- it is the dev's responsibility to understand what is necessary.

    Let's consider a few options that I've prepared (there are more, regex is a rabbit hole).

    (?:.. ?){16} 272steps: this prioritizes pattern brevity, at a large cost of validation strength and speed

    (?:[A-F\d]{2} ?){16} 208steps: this prioritizes brevity, with slightly improved validation and speed

    3F \d[A-F\d] 0D 01 3E \d[A-F\d] F3 FA \d[A-F\d] F8 43 FA [A-F\d]\d 04 FD 02 192steps: this is very literal and prioritizes validation, with a cost on pattern length and speed

    3F [A-F\d]{2} 0D 01 3E [A-F\d]{2} F3 FA [A-F\d]{2} [A-F\d]{2} 43 FA [A-F\d]{2} 04 FD 02 172steps: the quantifier {2} increases speed, but with a slight impact on validation because the broadened character ranges in each pair

    [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} [A-F\d]{2} 135steps: this is the largest sensible trade off when prioritizing speed, pattern brevity and validation strength are largely impacted

    [A-F\d ]{47} 12steps: _if validation only needs to protect from malicious strings rather than checking string quality, this may be the go.

    Then again, if your demand on validation is this low, then perhaps avoid the expense of regex/preg_match_all all together. Perhaps use str_split($str,49) or similar.

    So the decision is purely yours, but it is best to have a few options to compare and contrast.

    Whenever you have questions or doubts about a regex pattern, head over to regex101.com throw in some sample data and have a play with a few regex patterns. The page will show you errors, steps/speed, and captured data -- it's pretty handy.