Search code examples
hashgnuperfect-hashgperf

How to use null bytes in gperf?


The gperf info pages claims that if you specify -l then

The keywords in the input file may contain NUL bytes, written in string syntax as \000 or \x00, and the code generated by gperf will treat NUL like any other byte

However when I run this input file through gperf -L C++ -l:

foo
\000bar\000
\x00baz\x00
bat

I get:

  <snip>
  static const char * wordlist[] =
    {
      "", "", "",
      "foo",
      "", "", "", "",
      "bat",
      "", "",
      "\\x00baz\\x00",
      "", "", "", "",
      "\\000bar\\000"
    };
  <snip>

Which looks like it's treating the \000 and \x00 as literal values rather than null bytes.

How can I use correctly specify null bytes in my gperf strings?


Solution

  • You find a more precise documentation of the input syntax in section "Format for Keyword Entries":

    It can be given in two ways: as a simple name, i.e., without surrounding string quotation marks, or as a string enclosed in double-quotes, in C syntax, possibly with backslash escapes like \" or \234 or \xa8.

    And in gperf's test suite, you find an example:

    "\x00\x45\x00\x6E\x00\x67\x00\x6C\x00\x69\x00\x73\x00\x68",    "English",    "en_GB.UTF-8"