Search code examples
c++clinux

Checking if a C/C++ pointer points to read-only memory during runtime (in Linux OS)


Back in the day, low-level C/C++ code identified whether a pointer ptr pointed to the read-only memory in Linux OS by doing:

extern char etext, edata;
...
(ptr > &etext) && (ptr < &edata);

And in fact, from time to time I still see that in action.

However, in the answers to a question posted by someone else last year (Why does subtracting the value of etext from edata not give me the correct size for my text segment), it is shown that nowadays the traditional contiguous memory layout text -> data -> bss -> heap -> ... -> stack -> kernel is not guaranteed any longer.

This is rather puzzling to me, because as I said I see the above code in use still today. But also because the OP from that linked question understandably left a comment to the answers asking where could they find information about the memory section layouts nowadays - but answer authors did not know. This is quite a relevant technical information, but I also could not find further information about it that would help me assert whether the above code is still useful or how to modify it according to modern memory section layout rules.

This answer to a slightly different question suggests the following to check whether a pointer is in Static variable memory - to address the fact that nowadays the order of memory sections may be random:

(void*)(x) <= (void*)&end || (void*)(x) <= (void*)&edata

The rationale is that it suffices to check that the pointer is located before the end of BSS and Read-only data to affirm the pointer in Static memory.

So, my main question has two parts:

  • to my understating, the latter approach above is silently assuming that, as random as the location of the heap might be, it comes after BBS and Read-only data - never in-between. Is that a safe assumption?

  • if so, and knowing that text code segment at least comes first, can we adapt the approaches above to make sure to be detecting that a pointer is specifically in the read-only memory data (excluding BSS), in modern Linux OSes? I know there is no standard generalizable solution, but for now I am trying to solve the problem for Linux systems only (although solutions to Mac OSes may be quite similar, due to compilers often presenting equivalents to edata, etext and end for those OSes too).

My attempt is:

#include <iostream>
extern char etext, edata;


// the following is supposed to detect whether a pointer is in the Stack
// according to this answer:
https://stackoverflow.com/questions/35206343/is-it-possible-to-identify-whether-an-address-reference-belongs-to-static-heap-s
void *stack_bottom;
bool __attribute__((noinline)) inStack(void *x) {
    void *stack_top = &stack_top;
    return x <= stack_bottom && x >= stack_top;
}

// then my attempt at checking if a pointer is in read-only data,
// excluding the BSS segment:
bool roMem(void* c) {
    if(&etext < &edata && &end > &edata)
    {
        return (c > &etext) && (c < &edata) && (!inStack((void*)c));
    } else if(&etext > &edata && &end > &edata)
    {
        return (void*)(c) <= (void*)&edata && !inStack((void*)c);
    } else if(&end < &edata)
    {
        return (void*)(c) > (void*)&end && (void*)(c) <= (void*)&edata && !inStack((void*)c);
    }
}

class Test {
    public:
    const char* d;
    Test(const char* c) { d = c; }
};

static const char* str5 = "short";

int main() {
    const char* str1 = "short";
    char* str2 = const_cast<char*>("short");
    const char str3[6] = {'s', 'h', 'o', 'r', 't', 0};
    const char* str4 = "longgggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg";
    std::cout << roMem((void*)str1) << std::endl; // prints 1
    std::cout << roMem((void*)(const char*)str2) << std::endl; // prints 1
    std::cout << roMem((void*)str3) << std::endl; // prints 0
    std::cout << roMem((void*)str4) << std::endl; // prints 1
    std::cout << roMem((void*)str5) << std::endl; // prints 1
    std::cout << roMem((void*)"short") << std::endl; // prints 1
    std::cout << roMem((void*)(const char*)2) << std::endl; // prints 0
    Test x((const char*)3);
    std::cout << roMem((void*)x.d) << std::endl; // prints 0
}

In my Ubuntu 22.04 machine the above seems to be working (compiled with either g++12 and linker ld or clang++13), but the problem is that upon inspection (following this), it also seems to be generating the traditional sequence of data segment immediately after code text segment - in which case the old school (ptr > &etext) && (ptr < &edata); would suffice. The problem is when machines do not follow the traditional sequence of data segment immediately after text segment.


Solution

  • Read /proc/self/maps. Something along:

    #include <cstdint>
    #include <fstream>
    #include <iomanip>
    #include <ios>
    #include <iostream>
    #include <limits>
    #include <optional>
    #include <sstream>
    #include <vector>
    
    std::optional<bool> roMem(uintptr_t addr) {
        auto f = std::ifstream("/proc/self/maps");
        while (f) {
            uintptr_t start;
            uintptr_t stop;
            char c;
            if ((f >> std::hex >> start >> c >> stop >> c >> c) && start <= addr && addr < stop) {
                return c != 'w';
            }
            f.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        }
        return {};
    }
    
    template <typename T> std::optional<bool> roMem(T* c) {
        return roMem(reinterpret_cast<uintptr_t>(c));
    }
    
    std::ostream& operator<<(std::ostream& ss, std::optional<bool> o) {
        return ss << (o ? *o ? "true" : "false" : "none");
    }
    
    static const char* str5 = "short";
    int main() {
        std::cout << roMem(str5) << '\n';
    }