Search code examples
regexbashmacossedbsd

extract string from binary file - regex issue


I have a binary file that contains a readable filename* bounded by 'namexx:' and 'xx:piece', where x is any digit from 0-9 in both cases.

I am working on a Mac in bash 5.

I have tried using sed:

cat filename.xxx | sed -E 's/^.*name[0-9]{2}:(.*)[0-9]{2}:piece.*$/\1/'

The problem is that the regex does not consume the whole file, so I get a lot of random stuff returned in addition to the captured filename.

I've tried prefixing sed with LC_ALL=C as I read in another answer that this will treat all binary data as 'consumable' with wildcards, but it makes no difference (and I may have misunderstood).

I have also tried removing the beginning and end anchors, but that makes no difference either.


*The file is a torrent file from which I just want to extract the filename. I have looked at bencoding and trying to extract the filename, but it seemed too complex for a trivial task.


Solution

  • You may use

    sed -n -E 's/^.*name[0-9]{2}:(.*)[0-9]{2}:piece.*$/\1/p;' filename.xxx
    

    Here, -n prevents line from being printed and p prints the matches (what remains after replacement).

    As an alternative, you may use something like

    grep -m 1 -o 'name[0-9]\{2\}:\(.*\)[0-9]\{2\}:piece' filename.xxx | \
       sed -E 's/^name[0-9]{2}:(.*)[0-9]{2}:piece$/\1/'
    

    The first grep will only extract the first (-m 1) match and then sed will only keep the capturing group value inside the result.