Search code examples
cheader

include header file in this way, is this a good practice?


In gumbo html parser source code, I saw some strange use case of #include like this. They just include header file in an array definition block.

const char* kGumboTagNames[] = {
#include "tag_strings.h"
    "",  // TAG_UNKNOWN
    "",  // TAG_LAST
};

static const unsigned char kGumboTagSizes[] = {
#include "tag_sizes.h"
    0,  // TAG_UNKNOWN
    0,  // TAG_LAST
};

and then in the tag_string.h file lists all legal html tags,

"html",
"head",
"title",
"base",
"link",
"meta",
"style",
"script",
"noscript",
...
...

I know it works, but still want to know if this is a traditional way to import external data, or it's just a unusual hack out?


Solution

  • This is a decent and somewhat traditional way to use #include, if the array contents you're including has been automatically generated by some other process. (There are other ways of doing it, though, as I'll mention below.)

    It is a classic tradeoff. Most of the time, a good rule that's worth following is that you should use the preprocessor only in straightforward, traditional ways, because when you start getting "tricky" it's all too easy to create an unholy mess.

    But sometimes, you have an array whose contents you really want to generate using some automatic, external process. Copying and pasting the array definition into your source file would be tedious and error-prone, and you might have to keep redoing it as the array needed updating. Figuring out a way to automate the process is worth doing, and might even be worth violating the "no tricky preprocessor abuse" rule.

    To make this work you will usually want to have some automatic process (perhaps an awk, sed, perl, or python script) that generates the included file with the correct syntax. If you're using make or something like it, you can have that step automatically performed whenever the actual source data for the array changes. For instance, in the example you gave, you might have an original "source file" tags.list containing lines like

    html
    head
    title
    

    and then in your Makefile use something like sed 's/.*/"&",' to create the include file with the proper string initializer syntax. That way you don't force the folks who are updating the list to remember to always use the right quotes and commas.

    Also, as other commentators have suggested, you should probably give the file a name ending in something other than .h, to make clear that it's not an ordinary header file containing complete, valid C declarations. Better possibilities in this case would be .tab, .inc, or .arr.

    With a little more work, though, you could avoid the "hack" and do things just about 100% conventionally. If you tweaked your script to add the line const char* kGumboTagNames[] = { at the beginning of the generated file, and }; at the end, you could give it a name ending in .c, and just compile it, rather than including it. (This approach, however, would involve its own tradeoff, in that it would constrain the array to being global, not static or local.)

    Footnote: In some languages -- and even in C and C++, under some circumstances -- the comma is used as a separator, and you're not allowed to have one after the last element of a list. But in array initializers, you are allowed to have that trailing comma, and it turns out to be a pretty nice and useful freedom, precisely because it allows you to use straightforward techniques like the one described here, without the nuisance of having to insert an explicit extra step to get rid of the comma after the last element in the list.