I have a binary file that contains a readable filename* bounded by 'namexx:' and 'xx:piece', where x is any digit from 0-9 in both cases.
I am working on a Mac in bash 5.
I have tried using sed:
cat filename.xxx | sed -E 's/^.*name[0-9]{2}:(.*)[0-9]{2}:piece.*$/\1/'
The problem is that the regex does not consume the whole file, so I get a lot of random stuff returned in addition to the captured filename.
I've tried prefixing sed with LC_ALL=C
as I read in another answer that this will treat all binary data as 'consumable' with wildcards, but it makes no difference (and I may have misunderstood).
I have also tried removing the beginning and end anchors, but that makes no difference either.
*The file is a torrent file from which I just want to extract the filename. I have looked at bencoding and trying to extract the filename, but it seemed too complex for a trivial task.
You may use
sed -n -E 's/^.*name[0-9]{2}:(.*)[0-9]{2}:piece.*$/\1/p;' filename.xxx
Here, -n
prevents line from being printed and p
prints the matches (what remains after replacement).
As an alternative, you may use something like
grep -m 1 -o 'name[0-9]\{2\}:\(.*\)[0-9]\{2\}:piece' filename.xxx | \
sed -E 's/^name[0-9]{2}:(.*)[0-9]{2}:piece$/\1/'
The first grep
will only extract the first (-m 1
) match and then sed
will only keep the capturing group value inside the result.