Search code examples
cobjective-cfunction-pointersundefined-behaviorobjc-message-send

How does casting and calling obj_msgSend() not invoke undefined behavior?


I observed the usage of objc_msgSend to send messages to Objective-C IDs from pure C. The usage is not well documented but I found an example here.

What I am confused by is the function pointer is casted to a different type with different arguments and/or return values and then called, given the macros in the linked answer:

#define msg ((id (*)(id, SEL))objc_msgSend)
#define msg_int ((id (*)(id, SEL, int))objc_msgSend)
#define msg_id  ((id (*)(id, SEL, id))objc_msgSend)
#define msg_ptr ((id (*)(id, SEL, void*))objc_msgSend)
#define msg_cls ((id (*)(Class, SEL))objc_msgSend)
#define msg_cls_chr ((id (*)(Class, SEL, char*))objc_msgSend)

However, I thought casting and calling a function pointer through a different signature was undefined behavior. How could a C or even a C-callable function such as objc_msgSend() even be implemented as to be capable of dynamically expecting different argument lists and/or return types? How does that work out, and how does doing so evidently not invoke undefined behavior?


Solution

  • There appear to be two questions here:

    1. Does casting a function pointer to another function pointer type and calling the result trigger undefined behavior?
    2. How can you write objc_msgSend() such that you can pass it any number of arguments and expect the correct return type, arbitrarily?

    Undefined Behavior

    For the first: I started fleshing out this part of the answer by referencing the C11 draft standard (the finalized C standard documents are behind a paywall, but the published draft docs are functionally identical), but as I'm not a language-lawyer, I'm not entirely confident in answering this part of the question to your satisfaction.

    The relevant parts of the standards doc to reference:

    • §6.3.2.3¶8

      A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

      (Emphasis mine)

      If you cast between two "compatible" function pointer types, it's valid to call the cast function. So when are two functions "compatible"?

    • §6.7.6.3¶15

      15 For two function types to be compatible, both shall specify compatible return types. Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types. If one type has a parameter type list and the other type is specified by a function declarator that is not part of a function definition and that contains an empty identifier list, the parameter list shall not have an ellipsis terminator and the type of each parameter shall be compatible with the type that results from the application of the default argument promotions. If one type has a parameter type list and the other type is specified by a function definition that contains a (possibly empty) identifier list, both shall agree in the number of parameters, and the type of each prototype parameter shall be compatible with the type that results from the application of the default argument promotions to the type of the corresponding identifier. (In the determination of type compatibility and of a composite type, each parameter declared with function or array type is taken as having the adjusted type and each parameter declared with qualified type is taken as having the unqualified version of its declared type.)

    • §6.7.6.3¶10

      The special case of an unnamed parameter of type void as the only item in the list specifies that the function has no parameters.

    If you squint just right, you might be able to read "that the function has no parameters" as equivalent to having "an empty parameter list" in some sense, in which case, it can be safely passed any number of arguments since it doesn't specify any. (Somewhat intuitively: the risk in casting between incompatible function pointer types is that you read memory for an argument as if it were of another type, which is invalid. If a function declares that it doesn't accept any parameters, then it claims that it will never read any values passed to it, so the compiler can safely assume that it can pass in any arguments it wants because they'll never be used. In practice, of course, the function can do whatever it wants.)

    The return value aspect is a bit tougher to explain, hence my hesitance. §6.2.7 describes compatibility between types, but it doesn't mention void in any way, and is otherwise pretty vague. From elsewhere

    • §6.2.5¶1

      At various points within a translation unit an object type may be incomplete (lacking sufficient information to determine the size of objects of that type)

    • §6.2.5¶15

      The void type comprises an empty set of values; it is an incomplete object type that cannot be completed.

    So void is an "incomplete" type, which may just have arbitrary size and alignment (and can never be known) — but it doesn't appear to be explicitly stated anywhere that incomplete types and complete types (or void) are incompatible. (For the most part, "incomplete" types largely just mean that the compiler just isn't aware of their definition, and can't help you prevent invalid casts or alignments; I'm not aware of stricter requirements on such types.)

    The C standard is full of holes like this, where behavior can be somewhat sneakily gleaned not by what is said, but by what is left out. Someone with more experience than me in this area may be able to point to something in the standard which refutes this explicitly, but effectively, it appears that the standard implicitly leaves some leeway in expected behavior to allow this to be valid.

    Writing objc_msgSend()

    How could a C … function be written …?

    Here's the trick: objc_msgSend is necessarily written in assembly because it cannot possibly be written in C. It's not even really a function in the way that you might expect.

    The purpose of objc_msgSend is to take the arbitrary arguments it's given, find the pointer to the method with the given selector name for the receiver, and pass those arguments along exactly. In C, you can't do this, because C functions set up stack frames, and have to preserve certain registers and stack values; setting up a stack frame also means that the method you call has to return back to objc_msgSend itself when it returns, and the stack frame has to be torn down. This is both a lot of wasted work, and it means that your stack trace is littered with objc_msgSend references all over the place, which is wasteful. Directly writing this in assembly allows these limitations to be bypassed.

    Mike Ash goes into objc_msgSend in significantly more detail in several articles on his blog[1][2], but the gist:

    1. objc_msgSend is exposed as a C function, but its implementation is in assembly
    2. When called from C, the stack and registers are set up by the caller exactly how the recipient method expects to receive them, because it appears to have a regular C calling convention
    3. objc_msgSend itself doesn't touch any of the registers or the stack, and doesn't set up a stack frame or modify the return address; it simply finds the correct function pointer to pass exection off to itself based on the recipient object and the method name
    4. When the method is then called, because objc_msgSend hasn't touched any registers or the stack, it appears that the method was called directly, without objc_msgSend ever having been there. And because objc_msgSend hasn't modified the return pointer for the method, execution returns back directly to the caller of objc_msgSend, who can then safely read the return values off the stack because they received them directly from the called method

    Because you have to cast objc_msgSend's type to actually call it from C, if you've got the types right, the compiler will correctly set up the arguments to the method and also read the return value for you, all correctly.