I'm trying to find 7zip version 3 file headers in a file. According to the documentation they should look like this:
00: 6 bytes: 37 7A BC AF 27 1C - Signature
06: 2 bytes: 00 04 - Format version
So I constructed this grep command which should match them:
grep --only-matching --byte-offset --binary --text $'7z\xBC\xAF\x27\x1C\x00\x03'
Yet it also matches the string ending in 0000
:
% xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" | grep --only-matching --byte-offset --binary --text $'7z\xBC\xAF\x27\x1C\x00\x03'
2:7z'
13:7z'
The output I expect to have is just 13:7z'
It's not possible to pass zero byte as part of an argument. Because a string ends with zero byte in C
, so grep
when running strlen(argv[...])
will not "see" anything after zero byte.
If there are no newlines in regex, you could use --file=
.
xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" |
LC_ALL=C grep --only-matching --byte-offset --binary --text -f <(
echo -n 7z;
echo BCAF271C0003 | xxd -r -p
)
see https://www.gnu.org/software/grep/manual/grep.html#Matching-Non_002dASCII
Alternatively use PERL regex
xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" |
LC_ALL=C grep --only-matching --byte-offset --binary --text -P '7z\xBC\xAF\x27\x1C\x00\x03'
When dealing with binary, remember to disable UTF-8 sequences handling with locale setting LC_ALL=C
.
Note: <<<""
and $'string'
are not available in any shell - they are available in bash.