Search code examples
cgccmacrosc-preprocessorgnu-assembler

Temporarily escape function scope to define global symbols in C? (gcc)


Is there any way, in standard C or failing that with gcc/gas/binutils/etc, to define a global symbol using code (probably a macro) that's written within the syntatical scope of a function definition body?

Without the need for a separate macro for each such symbol appearing outside the function body?

For this purpose it's fine if I have to explicitly specify the ELF section to put the generated symbol in, fine if I have to use asm features like .pushsection and .popsection (if they can help?) etc. Any and all gcc extensions are fair game.

The reason I want do do this crazy thing is so that a tracepoint declaration (which must appear in a function body) can autogenerate associated metadata the tracing tool can look up. Without the need to separately pre-declare the tracepoints, i.e. make it as DRY as possible.

Pesudocode

void foo(void)
{
    normal_app_code();
    MAGICALLY_DEFINE_A_GLOBAL_VARIABLE_SYMBOL(symboltype, symbolname);
    more_normal_app_code();
}

such that the result of compilation is as if the above had been written:

symboltype symbolname;

void foo(void)
{
    normal_app_code();
    some_library_function_that_uses(symbolname);
    more_normal_app_code();
}

Context

I'm working on an idea for enhancing the systemtap/dtrace tracing API to support recording data types and names for probe arguments.

The design I have currently requires application authors to insert additional macros in top-level (global) scope to define the symbols that the stap runtime will use to discover the names and arg types of the probes, e.g. the contrived:

STAP_PROBE2_ARGNAMES(myprovider, myprobe, foo, bar);
STAP_PROBE2_ARGTYPES(myprovider, myprobe, const char *, MyDataType*);

void something(const char *foo)
{
    MyDataType *bar = get_bar();
    STAP_PROBE2(myprovider, myprobe, foo, bar);
}

That's inconvenient and it's going to be error-prone. Especially if the app wants to autogenerate STAP_PROBEn(..) probe points as part of its own macros.

I'd rather declare the arg type and name info alongside the tracepoint, at the same site, using preprocessor stringification to capture arg names (when they're simple variable name tokens) and the __typeof__ operator to capture their type names, something like:

void something(const char *foo)
{
    MyDataType *bar = get_bar();
    STAP_PROBE2_ARGINFO(myprovider, myprobe, foo, bar);
}

or for non-simple-token expression arguments something like:

void something(void)
{
    MyDataTypeHolder *barholder = get_barholder();
    STAP_PROBE2_ARGNAMES(myprovider, myprobe,
        something_global->foo, "foo",
        barholder->bar, "bar");
}

The macro expansion of STAP_PROBE2_ARGINFO takes care of generating a separate pair of symbols for char-array-arrays in the global symbol table using appropriate stringification and using the _typeof_ operator, e.g. the pseudo-ish-c:

#define STAP_PROBE2_ARGINFO(myprovider, myprobe, arg1, arg2) \
    STAP_PROBE2(myprovider, myprobe, (arg1), (arg2)) \
    STAP_PROBE2_ARGTYPES(myprovider, myprobe, (arg1), (arg2))
    STAP_PROBE2_ARGNAMES(myprovider, myprobe, (arg1), (arg2))

#define STAP_PROBE2_ARGTYPES(myprovider, myprobe, argname1, argname2) \
    const char _stapargtypes_#myprovider#_#myprobe[2][] = {#argname1, #argname2};

#define STAP_PROBE2_ARGTYPES(myprovider, myprobe, arg1, arg2) \
    const char _stapargtypes_#myprovider#_#myprobe[2][] = {__typeof__((arg1)), __typeof__((arg2))};

... and something similar for STAP_PROBE2_ARGNAMES that extracts and stores the explicitly supplied arg names separately.

The goal is that the result would resemble global declarations of:

const char _stapargnames_myprovider_myprobe[2][] = {"foo", "bar"};
const char _stapargtypes_myprovider_myprobe[2][] = {"const char *", "MyDataType*"};

and also emit the usual asm for the probe point itself at the callsite where the STAP_PROBE2_ARGINFO(...) appeared, as if it was a normal STAP_PROBE2(...).

Crazy?

Possible?


Solution

  • I don't see why you can't do this with __asm__ and .pushsection. Make an extern declaration for the variable in C, which is valid in block scope, so that it's accessible from the C, and pass its size as an integer literal operand to the __asm__. The inside the __asm__, you can define the symbol, make it .global if you wish (or not), and reserve space for it based on the passed-in size.