Search code examples
c++std-function

std::function internal memory organization and copies; passing reference vs value


When a std::function is copied, are the code instructions it references copied as well?

An std::function is initialized via some form of callable, that points to executable code in some way (like a function pointer typically does). Now, when a function-object is copied, is this executable code runtime copied or internally referenced? To rephrase the question: If one instance of std::function is copied, are there then multiple copies of the same compiled code instructions in memory? Is std::function an object that actually stores the function code or is it more an abstraction for a function pointer?

The former would seem wasteful and I don't suspect it, but everything I found so far on the subject is either too vague, lacking or too specific for me to say for me for sure. For example

When the target is a function pointer or a std::reference_wrapper, small object optimization is guaranteed, that is, these targets are always directly stored inside the std::function object, no dynamic allocation takes place. Other large objects may be constructed in dynamic allocated storage and accessed by the std::function object through a pointer. - cppreference

gives some hints about how it's done but seems still too vague and maybe is not related at all to this question, because of further abstractions inside of std::function.

For context: I am trying to refactor some bad C-ish code that maps input-events (keystrokes, mouse input and the like) to a certain behavior, which is executed upon a target data structure which can be interpreted by the program as more specific input that have semantic context other than than keystrokes (, aka keybindings). One can suspect that requirements of behaviours varies drastically.
This was previously implemented with lists of defines and numbers specifying input-event-ids, and hard-coded behavior, which was selected by switch-case. We quickly approach the border of where this intial way of doing it becomes unwieldly.
To get out of the defined lists to an expandable, declarative, object oriented and flexible design I consider higher order functions.
Especially since some behavior is quite simple and repeatedly needed (like for example the toggle of one value in the output data structure) other behaviors are more complex with multiple conditions attached, I'd like to declare some of the behavior statically, but still would like to be open to just assign some special lambda in some cases. Since I need to store behavior per input (key, mousebutton, mouse-axis, etc.) and potentially many copies of one certain behaviour type can be instantiated in one time for different sets of keybindings, I wonder if this behavior should be referenced, rather than stored by value. In the former case, fresh lambdas would need to be owned by the behavior structures, but statically declared behavior does not, which pragmatically would lead to some shared_ptr shenanigans. In the latter case, by value, this would not be an issue, but I wouldn't want multiple copies of for example the toggle behavior to cause too much redundant overhead instead.


Solution

  • (Note: the whole discussion below is a little simplified. AFAIK, none of it is wrong, but I did omit some details and edge cases and definitions and implementation stuff.)

    The std::function does not copy any executable code. The executable code is always merely pointed to, by std::function. And when the std::function gets copied, the pointer gets duplicated (which is completely fine, because executable code is never freed either.) So far, there is no difference between a plain old function pointer and a std::function.

    But that's not the whole story.

    Contrary to function pointers, instances of std::function can carry around "state" as well as a pointer to the executable code, and the whole hubbub about std::function having to allocate/deallocate and copy/move data around is about this extra state, not the function pointer.

    Suppose that you have code like this:

    (And note that although I've used a lambda here, the following explanation would have been equally applicable for "functors" and "function objects" and "bind results" and other forms of callable things in C++, all except plain old function pointers.)

    int x = 42, y = 17;
    std::function<int()> f = [x, y] {return x + y;};
    

    Here, f not only stores the pointer to the executable code for return x + y;, but it also has to remember the value of x and y. Since the amount of state that you can "capture" in this way is not limited, then - by definition - the std::function must allocate memory from the heap upon construction, and deallocate it, copy it and move it at appropriate times. Again, it is this extra "state" that gets copied, not the code.

    Let's review: each std::function needs to be able to store at least a pointer to executable code, and 0 or more bytes of extra captured state. If there is no captured state, a std::function is essentially the same as a function pointer (although in practice, std::functions are usually implemented polymorphically and have other stuff in there.)

    Some (most) implementations of std::function that I'm aware of employ an optimization that is called "Small Object Optimization". In these implementations, in addition to the space for the pointer to code, the std::function object has some more (fixed amount of) space inside its instance (i.e. as a member of its class, as opposed to somewhere else on the heap) and will use that area if the total number of bytes of the captured state would fit in there. This eliminates the heap allocation, which is important in some use cases and would balance out the additional memory used (when there is no or little state to capture.)