Search code examples
rubybinary-datadata-extraction

Extract hex strings from binary file in Ruby


for the binary file I want to extract the hex strings in green (date and hour) and in blue. The hex string in blue is between the byte 09 and 00.

I've been able to extract using regex the date and hour and partially the hex string in blue. For this I've set as "line separator" the byte 09 (\x09)

The issue I have maybe could be fix with a regex to get the string between 09 and 00, but currently with my regex (^20.*) I'm getting undesired and non ascii bytes. May someone help me to get bytes only between 09 and 00.

My current code:

BEGIN{  $/="\x09".force_encoding("BINARY")   }

IO.foreach("file.dat"){ |l|

    line = l.unpack('H*')[0]
    next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|(^20.*)/

        if ( $1 != nil and $2 != nil )
            date = $1
            hour = $2
            p date.gsub(/../) { |b| b.hex.chr }
            p hour.gsub(/../) { |b| b.hex.chr } 
        end

        if $3 != nil            
            p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }         
        end 
}

Current output

"2017-10-19"
"15:43:27"
"83492624790981030E100000\x00\x18\v\x16\x84\x13\x05$B#q\x000\x03\x81\x01\n\x00\x00v\x00\x0000000003\t"
"2017-12-05"
"09:32:15"
"001104059419632801001B237100300381010A0000\x00\x00\x00\x00\x02\xD0\x00\x00\x00\b\xFEF\xCC\x00\x06\xE7\f\x13\x0F+\e\xB5\xE1/\x00\xB5\x83I&$y\t"
=> nil

Expected output

"2017-10-19"
"15:43:27"
"83492624790981030E100000"
"2017-12-05"
"09:32:15"
"001104059419632801001B237100300381010A0000"
=> nil

The file looks like this: enter image description here

Attached sample file: file.dat


Solution

  • In order to get the bytes starting with 20 and ending with 00 you need to change the regex like this:

    next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|^20(.*?0?)0{2}/
    

    Basically I changed only the last part of the regex from (^20.*) to ^20(.*?0?)0{2}. Here's the explanation:

    • starting from 20 - ^20
    • match as little as possible - .*?
    • until you get to two consecutive 0s 0{2}
    • the 0? after .*? handles the case where you have X0 00

    Also I'm not including 20 in the captured group since you are removing it later in the code anyways, so you can remove the .gsub(/20/, '') in

    p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }