Search code examples
rubybinaryfilesbinary-datapackunpack

Ruby - Unpack array with mixed types


I am trying to use unpack to decode a binary file. The binary file has the following structure:

ABCDEF\tFFFABCDEF\tFFFF....

where

ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times

I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do

s.unpack('F*')

Or if I had integers and floats like

[1, 3.4, 5.2, 4, 2.3, 7.8]

I would do

s.unpack('CF2CF2')

But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.

I need to use Ruby 2.0.0-p247 if that matters

Example

ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')

then

s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 @ff\x0EAff"]

Finally

s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'

Solution

  • You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information. You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.

    Here's an example, which is similar to the documentation of IO.read:

    data = [["ABCDEF\t", 3.4, 5.6, 9.1], 
            ["FEDCBA\t", 2.5, 8.9, 3.1]]
    binary_file = 'data.bin'
    chunk_size = 19
    pattern = 'A7fff'
    
    File.open(binary_file, 'wb') do |o|
      data.each do |row|
        o.write row.pack(pattern)
      end
    end
    
    raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size
    
    File.open(binary_file, 'rb') do |f|
      while record = f.read(chunk_size)
        puts '%s %g %g %g' % record.unpack(pattern)
      end
    end
    # =>
    #    ABCDEF   3.4 5.6 9.1
    #    FEDCBA   2.5 8.9 3.1
    

    You could use a multiple of 19 to speed up the process if your file is large.