Search code examples
structcommand-linedecodedumpod

Equivalent of a multi-type-struct od output


I can use od when I want to dump the contents of a non-textual file to a terminal (or a text file) as human-readable values: I can peer into files with elements of various types - signed or unsigned integers, floating point or printable ASCII. (You can also have the data printed in various bases like hexadecimal or octal, hence the name, but that's not what I care about.)

The limitation is, that the input file is assumed to have a single, uniform data type. But - what if this is not the case? What if I have triplets of, say, a single-byte unsigned value, then a floating-point element of size 4 bytes, and then a signed integer element of size 2 bytes? i.e. in od terms, u1,f4,d2?

I would like to see a sequence of triplets of numbers of these types printed for me; with any reasonable convention of line-breaking and field-delimitation. Suppose I want to specify my struct/tuple format as in the above, i.e. comma-separated-od-style; but I'm flexible on the specifics of this.

Can I use the shell and common command-line tools to achieve this relatively painlessly?


Solution

  • The od command will accumulate multiple formats with a single -t option (e.g., -t u1f4d2 in your case), and output a line for each type requested. Since you have multiples of the same type, adding them to the -t option only adds redundant information, so we can just use the representative types. Attempting to generate some data like describe, you get something like the following, with a line of output for each requested type:

    % echo "128 255 12 3.7 -12" | perl -ne "print pack("CCCfs", split)" | od -An -tu1f4d2
     128 255  12 205 204 108  64 244 255               // u1
      -1.4784717e+08  -6.0981913e+31        3.57e-43   // f4
        -128  -13044   27852   -3008     255           // d2
    

    Unfortunately, it seems that od tries to apply the requested type for each line, and since in your example, the three unsigned bytes cause the floating-point value following them not to start on a word (32-bit) boundary, it can't decode the float correctly.

    However, if your data packing matches word boundaries, then you can get pretty close. By inserting an additional unsigned byte after your triple:

    % echo "128 255 12 255 3.7 -12" | perl -ne "print pack("CCCCfs", split)" | od -An -tu1f4d2
     128 255  12 255 205 204 108  64 244 255
      -1.8741855e+38             3.7      9.1819e-41  // we get the correct float
        -128    -244  -13107   16492     -12          // and signed short
    

    With this scenario, we can get close to what you ask with some more shell magic

    % echo "128 255 12 255 3.7 -12" | perl -ne "print pack("CCCCfs", split)" | od -An -tu1f4d2 | paste -sd '  \n' | awk '{ print $1, $2, $3, $12, $18 }'
    
    128 255 12 3.7 -12
    

    Decoding that command pipeline a bit:

    Command Description
    echo "128 255 12 255 3.7 -12" Create some data in the form requested (four unsigned bytes, float, and a signed short)
    perl -ne "print pack("CCCCfs", split)" write them as binary
    od -An -tu1u1u1u1fFdS decode the binary. od will write a line of output for each type requested:
     • decoded as unsigned bytes
     • decoded as floats
     • decoded as signed shorts
    paste -sd ' \n' combine the three lines together
    awk '{ print $1,$2,$3,$12,$18 }' print the selected fields from the space-separated output

    awk is just one option for isolating the fields you're looking for.

    If you need to do this for multiple structures of the same size you can use a combination of od's -N (number of bytes to read) and -w (number of bytes of width to print) fields (with the limitation that the bytes read must be evenly divisible by the width, and be a multiple of the word (e.g., 32-bit) size), or you might use a loop in a shell script to use the -j <n> (have od skip the first n bytes of the file) combined with the -N option.