for the binary file I want to extract the hex strings in green (date and hour) and in blue. The hex string in blue is between the byte 09 and 00.
I've been able to extract using regex the date and hour and partially the hex string in blue. For this I've set as "line separator" the byte 09 (\x09)
The issue I have maybe could be fix with a regex to get the string between 09 and 00, but currently with my regex (^20.*) I'm getting undesired and non ascii bytes. May someone help me to get bytes only between 09 and 00.
My current code:
BEGIN{ $/="\x09".force_encoding("BINARY") }
IO.foreach("file.dat"){ |l|
line = l.unpack('H*')[0]
next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|(^20.*)/
if ( $1 != nil and $2 != nil )
date = $1
hour = $2
p date.gsub(/../) { |b| b.hex.chr }
p hour.gsub(/../) { |b| b.hex.chr }
end
if $3 != nil
p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }
end
}
Current output
"2017-10-19"
"15:43:27"
"83492624790981030E100000\x00\x18\v\x16\x84\x13\x05$B#q\x000\x03\x81\x01\n\x00\x00v\x00\x0000000003\t"
"2017-12-05"
"09:32:15"
"001104059419632801001B237100300381010A0000\x00\x00\x00\x00\x02\xD0\x00\x00\x00\b\xFEF\xCC\x00\x06\xE7\f\x13\x0F+\e\xB5\xE1/\x00\xB5\x83I&$y\t"
=> nil
Expected output
"2017-10-19"
"15:43:27"
"83492624790981030E100000"
"2017-12-05"
"09:32:15"
"001104059419632801001B237100300381010A0000"
=> nil
Attached sample file: file.dat
In order to get the bytes starting with 20
and ending with 00
you need to change the regex like this:
next unless line =~ /(.{8}2d.{4}2d.{4})20(.{4}3a.{4}3a.{4})|^20(.*?0?)0{2}/
Basically I changed only the last part of the regex from (^20.*)
to ^20(.*?0?)0{2}
.
Here's the explanation:
^20
.*?
0{2}
0?
after .*?
handles the case where you have X0 00
Also I'm not including 20
in the captured group since you are removing it later in the code anyways, so you can remove the .gsub(/20/, '')
in
p $3.gsub(/20/,"").gsub(/../) { |b| b.hex.chr }