Search code examples
c++clinuxunixxxd

Script/Tool to convert C source code encoded by "xxd -i" to C source code?


Is there a Linux/Unix tool that can be used to convert the hex dump array of a C file (i.e. the output of xxd -i) to the corresponding source code?


Solution

  • The output of xxd -i xyz.c for a source file xyz.c looks like:

    unsigned char xyz_c[] = {
      0x23, 0x69, 0x6e, 0x63, 0x6c, 0x75, 0x64, 0x65, 0x20, 0x3c, 0x73, 0x74,
      0x64, 0x69, 0x6f, 0x2e, 0x68, 0x3e, 0x0a, 0x23, 0x69, 0x6e, 0x63, 0x6c,
      0x75, 0x64, 0x65, 0x20, 0x3c, 0x73, 0x74, 0x64, 0x6c, 0x69, 0x62, 0x2e,
      0x68, 0x3e, 0x0a, 0x23, 0x69, 0x6e, 0x63, 0x6c, 0x75, 0x64, 0x65, 0x20,
      0x3c, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x2e, 0x68, 0x3e, 0x0a, 0x0a,
      …
      0x65, 0x5f, 0x6c, 0x69, 0x73, 0x74, 0x28, 0x73, 0x74, 0x61, 0x72, 0x74,
      0x29, 0x3b, 0x0a, 0x20, 0x20, 0x20, 0x20, 0x7d, 0x0a, 0x0a, 0x20, 0x20,
      0x20, 0x20, 0x72, 0x65, 0x74, 0x75, 0x72, 0x6e, 0x20, 0x30, 0x3b, 0x0a,
      0x7d, 0x0a
    };
    unsigned int xyz_c_len = 4442;
    

    Assume that is stored in a file xyz.xxd.

    In many ways, the easiest way to regenerate the original code is:

    #include <stdio.h>
    #include "xyz.xxd"
    
    int main(void)
    {
        for (unsigned int i = 0; i < xyz_c_len; i++)
            putchar(xyz_c[i]);
        return 0;
    }
    

    With some more care and some macros, you could make that a general-purpose outline program for the job — you'd need to supply the file name, and the two C variable names to be used.

    If you can't (or don't want to) use a C compiler for the job, then writing a tool using Python or Perl is a straight-forward exercise. For example, a not necessarily minimal Perl script is:

    #!/usr/bin/env perl -na
    use strict;
    use warnings;
    
    # xxd -i drops the final comma - aargh (why?)!
    foreach my $word (@F)
    {
        next unless $word =~ m/^0[Xx][[:xdigit:]]{2},?$/;
        $word =~ s/,//;
        printf "%c", hex($word);
    }
    

    It uses the 'auto-split' option (-a) and 'automatic read but do not print' option (-n), and then processes any words in the input that look like a hex character, such as 0x0a (optionally followed by a comma since xxd -i somewhat unnecessarily omits the comma after the final byte value) and converts that to the corresponding byte. It being Perl, TMTOWTDI — There's More Than One Way To Do It.