I want to get a byte offset of a string pattern from a binary file on embedded linux platform.
If I can use "grep -b" option, It would be best way but It is not supported on my machine.
machine does not support
ADDR=`grep -oba <pattern string> <file path> | cut -d ":" -f1`
Here the manual of grep command on the machine.
root# grep --help
BusyBox v1.29.3 () multi-call binary.
Usage: grep \[-HhnlLoqvsriwFE\] \[-m N\] \[-A/B/C N\] PATTERN/-e PATTERN.../-f FILE \[FILE\]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file
Since that option isn't available, I'm looking for an alternative.
the combination of hexdump and grep can be also useful
such as
ADDR=`hexdump <file path> -C | grep <pattern string> | cut -d' ' -f1`
But if pattren spans multiple lines, it will not be found.
Is there a way to find the byte offset of a specific pattern with a Linux command?
Set the pattern as the record separator in awk. The offset of the occurrence is the length of the first record. BusyBox awk treats RS
as an extended regular expression, so add backslashes before any of .[]\*+?^$
in the pattern string.
<myfile.bin awk -v RS='pattern' '{print length($0); exit}'
If the pattern contains a null byte, you need a little extra work. Use tr
to exchange null bytes with some byte value that doesn't appear in the pattern. For example, if the pattern's hex dump is 00002a61
:
<myfile.bin tr '\0!' '!\0' | awk -v RS='!!-A' '{print length($0); exit}'
If the pattern is not found, this prints the length of the whole file. So if you aren't sure whether the pattern is present, you need again some extra work. Append some text that can't be part of a pattern match to the file, so that you know that if there's a match, it won't be at the very end of the file. Then, if the pattern is present, the file will contain at least two records. But if the pattern is not present, the file only contains the first record (without a record separator after it).
{ cat myfile.bin; echo garbage; } |
LC_ALL=C awk -v RS='pattern' '
NR==1 {n = length($0)}
NR==2 {print n; found = 1; exit}
END {exit !found}
'
LC_ALL=C
forces awk to use a single-byte locale with no invalid characters. It's necessary if your ambient locale is multibyte and your awk implementation is locale-aware (e.g. mawk or gawk), and harmless otherwise (e.g. with BusyBox).