Search code examples
clanguage-lawyerc23

Can strlen be [[unsequenced]]?


The wording of the C23 working draft on unsequenced functions is unclear to me. Among other properties, unsequenced functions have to be independent:

(6) An object X is observed by a function call

  • (6.1) if both synchronize,
  • (6.2) if X is not local to the call,
  • (6.3) if X has a lifetime that starts before the function call, and
  • (6.4) if an access of X is sequenced during the call;

the last value of X, if any, that is stored before the call is said to be the value of X that is observed by the call.

A function pointer value f is independent if

  • (6.5) for any object X that is observed by some call to f through an lvalue that is not based on a parameter of the call, then all accesses to X in all calls to f during the same program execution observe the same value;
  • (6.6) otherwise if the access is based on a pointer parameter, there shall be a unique such pointer parameter P such that any access to X shall be to an lvalue that is based on P.

A function definition is independent if the derived function pointer value is independent.

- N3096 $6.7.12.7 p6, re-formatted for the sake of readability

My question is whether functions like strlen can be independent. Consider this minimal implementation:

size_t strlen(const char* s) {
    size_t len = 0;
    for (; *s; ++s) { ++len; }
    return len;
}

Firstly, is *s considered to be access that is based on a parameter? I believe the access is based on a parameter, namely a pointer parameter, so only the restrictions of (6.6) are relevant.

However, (6.6) could be an issue. Notice that (6.5) says "all accesses to X in all calls to f", whereas (6.6) says "all accesses to X", which is broader, and may apply to access outside the function too. The following scenario is a problem then:

struct string {
    char data[N];
} str;
// ...
strlen(str.data); // A
str = ...;
strlen(str.data); // B

Not all accesses to str globally take place through a unique pointer parameter s to strlen. Some of them (str = ...) don't even involve pointers whatsoever. If my understanding is correct, this disqualifies strlen from being independent, and thus unsequenced.

In summary, are my interpretations correct, and strlen cannot be unsequenced? Is this perhaps just a wording issue, but it was intended to be a unsequenced?


Note on possible intent

The proposal for [[unsequenced]] claims that unlike GCC's [[gnu::const]], pointer parameters are supported. However, I am not certain whether the wording reflects that, and to what extent.

See N2956 5.8 Some differences with GCC const and pure


Solution

  • My question is whether functions like strlen can be independent.

    As a preliminary matter, the wording of the relevant section of the spec is terrible. As in, having grammar not matched to the seeming intent in some places, and being grammatically incorrect in other places, in ways that make the intended meaning difficult to parse, and being a bit redundant. This rewrite better conveys what I think the spec means to say:


    An object X is observed by a function call if

    • X has a lifetime that starts before the call, and
    • X and the call synchronize, and
    • an access of X is sequenced during the call.

    In that case, if any store to X happens before the call then the value written by the last such store is said to be the value of X that is observed by the call. Otherwise, the initial value of X is the one observed by the call.

    A function pointer value f is independent if for every object X that is observed by a call C to *f,

    • if any access by which such a C observes X is via an lvalue that is based on a pointer parameter of *f, then there is a unique pointer parameter P of *f such that every access to X by such a C shall be via an lvalue that is based on P;

    • otherwise, every such C during the same program execution that observes the value of X observes the same value.

    A function definition is independent if the derived function pointer value is independent.


    The use of a function pointer as the primary subject of the definition is drawn from the original text. I take it to be meant to clarify that independence is about a function as identified by its address, as opposed to by name or implementation. Thus, it could be that a program contains two static functions with the same name and identical code, with one independent and the other not.

    It is worth noting that a function or function pointer being independent -- if that is recognized by the compiler -- speaks to possible reordering of the function call relative to other operations, which is the point of being unsequenced. But independence is only about a function's reads, so it is not enough on its own to enable reordering of calls.

    Consider this minimal implementation:

    size_t strlen(const char* s) {
        size_t len = 0;
        for (; *s; ++s) { ++len; }
        return len;
    }
    

    Firstly, is *s considered to be access that is based on a parameter?

    Short answer: yes.

    Long answer: The *s is an lvalue. It appears in a context where lvalue conversion will be performed. That lvalue conversion performs a read of the referenced object, which is an access to that object. I see no room for any interpretation that *s is not based on s, which is a parameter of the function.

    However, (6.6) could be an issue. Notice that (6.5) says "all accesses to X in all calls to f", whereas (6.6) says "all accesses to X", which is broader, and may apply to access outside the function too.

    Sort of. The provision is not about pointer values, but rather about the function parameter by which pointers are fed to the function. Therefore, interpreting it as applying outside the context of calls to f would mean that the object in question could not be accessed at all outside of f without disqualifying f as independent. I am confident that that is not the intent, and my above rewrite of the provision is clearer about that.

    In summary, are my interpretations correct, and strlen cannot be unsequenced? Is this perhaps just a wording issue, but it was intended to be a unsequenced?

    If your interpretation were correct then substantially no function would be unsequenced (because a function must be independent to be unsequenced).

    As I interpret the spec, the strlen() definition provided defines a function that is unsequenced. And I note that that's independent of whether it actually carries the [[unsequenced]] annotation.