Search code examples
c++windowsloopsmemorysystem

Is it possible to get the address of the first string in memory of a process in c++?


Is it possible to get the address of the first string in memory of a process in c++? Like i dont have to iterate through the whole memory of the process to find strings but i can start iterating from the address of the first string? I'm talking about finding the first of any type of variable not just string, though i used it as an example. Let me give an example in code:

for (int i = 1; i < 10000000; i++)
{
    std::string buf;
    ReadProcessMemory(hproc, (LPCVOID)i, &buf, sizeof(std::string), NULL);
    std::cout << buf << std::endl;
}

Should this be necessary? can't i just start i with the address of the first string, or do this faster any other way?


Solution

  • Std strings allocate their internal storage using the same heap as all other objects allocated using operator new. There is no way to distinguish the various pieces of content.

    That said, from a purely intellectual excercise, it is possible to declare your own implementation of basic_string with a custom allocator, where you can watch the memory allocation and deallocation.

    The full template definition of std::basic string is:

    template<
        class CharT,
        class Traits = std::char_traits<CharT>,
        class Allocator = std::allocator<CharT>
    > class basic_string;
    

    And std::string is actually std::basic_string<char>, relying on the defaults for Traits and Allocator.

    Where you could supply your own Allocator implementation, which might, for example, log the allocations and deallocations that your program performs, or store the allocations sequentially in some contiguous storage. You would only be tracking the strings you explicitly declared your way.

    Note that move semantics are disabled between standard containers that use different allocators, and in many cases calling std library methods that take or return strings as parameters will invoke a copy between your customised string and a standard allocated version, so the view might still not be a true one.

    As I understand it, the current implementations of std::string simply store the character content in the allocated storage, but it is possible to store other information in the allocated space to help manage the string growth.

    Note that the capacity and current length of the string is stored in the std::string object that also points to this buffer, so it is difficult to infer much useful about the content of the allocated buffer. If it is being used as a legal string then the c++ standards say that it is zero terminated; so that suggests it could be treated as a C char* string.

    Many implementations optimise for very small strings by storing them inline in the std::string object with no dynamic allocation. They rely on some arbitrary rule to understand which layout is current (nullptr or dynamic or inline), such as "knowing" that the capacity, a 64bit integer, can never be bigger than 56 bits value, so the "top byte" is always zero for a dynamic string; but is actually the 1byte size() for an inline string, with a constant max size: The remaining bytes can then be used for the inline content, perhaps up to 23 characters!

    But other schemes are possible. For example the old microsoft string (not std::string) used to store all the length and capacity info in the pointed to object at the start, and the string object was a simple pointer to the next text part of the buffer. This was very convenient for use in printf()! Other pre-c++11 std::string systems tried to share strings and stored reference counts, but the "lifetime" rules for c++11 have made that virtually untenable.