Search code examples
cpointerserror-handlingmemory-addressvoid-pointers

Pointer address span on various platforms


A common situation while coding in C is to be writing functions which return pointers. In case some error occurred within the written function during runtime, NULL may be returned to indicate an error. NULL is just the special memory address 0x0, which is never used for anything but to indicate the occurrence of a special condition.

My question is, are there any other special memory addresses which never will be used for userland application data?

The reason I want to know this is because it could effectively be used for error handling. Consider this:

#include <stdlib.h>
#include <stdio.h>

#define ERROR_NULL 0x0
#define ERROR_ZERO 0x1

int *example(int *a) {
    if (*a < 0)
        return ERROR_NULL;
    if (*a == 0)
        return (void *) ERROR_ZERO;
    return a;
}

int main(int argc, char **argv) {
    if (argc != 2) return -1;
    int *result;
    int a = atoi(argv[1]);
    switch ((int) (result = example(&a))) {
        case ERROR_NULL:
            printf("Below zero!\n");
            break;

        case ERROR_ZERO:
            printf("Is zero!\n");
            break;

        default:
            printf("Is %d!\n", *result);
            break;
    }
    return 0;
}

Knowing some special span of addresses which never will be used by userland applications could effectively be utilized for more efficient and cleaner condition handling. If you know about this, for which platforms does it apply?

I guess spans would be operating system specific. I'm mostly interested in Linux, but it would be nice to know for OS X, Windows, Android and other systems as well.


Solution

  • The answer depends a lot on your C compiler and on your CPU and OS, where your compiled C program is going to run.

    Your userland applications typically will never be able to access data or code through pointers pointing to the OS kernel data and code. And the OS usually does not return such pointers to applications.

    Typically they will also never get a pointer pointing to a location that's not backed up by physical memory. You can only get such pointers through an error (a code bug) or by purposefully constructing such a pointer.

    The C standard does not anyhow define what a valid range for pointers is and isn't. In C valid pointers are either NULL pointers or pointers to objects whose lifetime hasn't ended yet and those can be your global and local variables and those created in malloc()'d memory and functions. The OS may extend this range by returning:

    • pointers to code or data objects not explicitly defined in your C program at its source code level (the OS may let apps access some of its code or data directly, but this is uncommon, or the OS may let apps access some of their parts that are either created by the OS when the app loads or created by the compiler when the app was compiled, one example would be Windows letting apps examine their executable PE image, you can ask Windows where the image starts in the memory)
    • pointers to data buffers allocated by the OS for/on behalf of apps (here, usually, the OS would use its own APIs and not your app's malloc()/free(), and you'd be required to use the appropriate OS-specific function to release this memory)
    • OS-specific pointers that can't be dereferenced and only serve as error indicators (e.g. you could have more than just one undereferenceable pointer like NULL and your ERROR_ZERO is a possible candidate)

    I would generally discourage use of hard-coded and magic pointers in programs.

    If for some reason, a pointer is the only way to communicate error conditions and there are more than one of them, you could do this:

    char ErrorVars[5] = { 0 };
    void* ErrorPointer1 = &ErrorVars[0];
    void* ErrorPointer2 = &ErrorVars[1];
    ...
    void* ErrorPointer5 = &ErrorVars[4];
    

    You can then return ErrorPointer1 through ErrorPointer1 on different error conditions and then compare the returned value against them. There' a caveat here, though. You cannot legally compare a returned pointer with an arbitrary pointer using >, >=, <, <=. That's only legal when both pointers point to or into the same object. So, if you wanted a quick check like this:

    if ((char*)(p = myFunction()) >= (char*)ErrorPointer1 &&
        (char*)p <= (char*)ErrorPointer5)
    {
      // handle the error
    }
    else
    {
      // success, do something else
    }
    

    it would only be legal if p equals one of those 5 error pointers. If it's not, your program can legally behave in any imaginable and unimaginable way (this is because the C standard says so). To avoid this situation you'll have to compare the pointer against each error pointer individually:

    if ((p = myFunction()) == ErrorPointer1)
      HandleError1();
    else if (p == ErrorPointer2)
      HandleError2();
    else if (p == ErrorPointer3)
      HandleError3();
    ...
    else if (p == ErrorPointer5)
      HandleError5();
    else
      DoSomethingElse();
    

    Again, what a pointer is and what its representation is, is compiler- and OS/CPU-specific. The C standard itself does not mandate any specific representation or range of valid and invalid pointers, so long as those pointers function as prescribed by the C standard (e.g. pointer arithmetic works with them). There's a good question on the topic.

    So, if your goal is to write portable C code, don't use hard-coded and "magic" pointers and prefer using something else to communicate error conditions.