Search code examples
clanguage-lawyerconstant-expressionconstantfolding

Minimum Guaranteed Constant Folding in C


Question

I am curious if there is any guarantees about constant folding being done in C.

Where I've looked

This link on a site whose reputability is unknown to me makes an offhand comment:

All C compilers can fold integer constant expressions that are present after macro expansion (ANSI C requirement).

But I don't see anything in e.g. The C Programming Language, second edition (which I presume was thoroughly updated to account for all details of ANSI C). But after checking the index for references to relevant words, I found nothing that promises this. Pages 38 and 209 in particular come close, because they say any expression which could be evaluated at compile-time may be used where a constant could be used (perhaps with some restrictions, if we are pedantic enough), and it says that such expressions "may" be evaluated at compile-time, not "will"/(some synonym).

I searched this C89 final draft. The word "folding" and "folded" yielded no results of value, and the search "constant expression" generated 63 matches, of which I checked about half. The primary part of interest seems to basically be the same as the book (it uses the word "can" instead of "may", but those are synonymous in this context).

Both of these seem to logically strongly imply to me that every ANSI C compiler must have basic constant-folding capabilities, but at the same time, there doesn't seem to be any hard prohibition against constant expressions being compiled into code which computes the expression at runtime (such an implementation still benefits from constant expressions because the compiler can generate code which compute it once and then assumes the value won't change, and such an implementation might be forced by limitations in a given underlying architecture - e.g. several RISC architectures must either use two instructions to initialize certain possible values, or load them from a memory location).

I also briefly searched this C99 final draft, but "folding" yielded one result of no value, while "folded" and "constant" had over a hundred matches each that I currently couldn't allocate the time to crawl through.

Motivation

I wrote these macros for more semantic/intent clarity in some bit-twiddling code:

#define UCHAR_LOW_N_BITS_m(n) (unsigned char )(UCHAR_MAX >> (CHAR_BIT - (n)))
#define UCHAR_NTH_BIT_m(n) (unsigned char )(1 << (n))

..where n is always an integer literal. I want to be soothed, to hear a reassuring voice say "it's okay, every C compiler in use that remotely matters will fold these constants for you". (P.S. I asked a separate question about to whether UCHAR_NTH_BIT_m should act as if bits start from the 0th or 1st bit, hopefully in the right place.)

And yes, the bottom one could be turned into separate macros, e.g. #define UCHAR_1ST_BIT_m (unsigned char )1 through #define UCHAR_3RD_BIT_m (unsigned char )4 or however many I happen to need in the code - and while I'm undecided as to which of those is better, it's probably a moot point, because if I want to be a good pedantic-language-lawyer-type C programmer, I can't exactly avoid the top one (gotta make sure the code does the right thing on those DSP/embedded and ancient-mainframe C implementations).


Solution

  • In the C standard, the term for compile-time is translation time, and events that happen at compile-time may be described as during translation. You can search the standard for those terms.

    The closest thing in this standard to your topic is the definition of constant expression in section 6.6 (C11). The overview is:

    A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.

    Note that "can be" does not mean "must be". We would expect a good quality compiler to evaluate constant expressions at compile-time although this is not an absolute requirement.

    Section 6.6 is too big to paste the whole thing here, but by consulting the C Standard (or a draft such as N1570) you can read about which expressions are defined as constant expressions and it would be reasonable to assume that your compiler evaluates them at compile-time.

    If n >= 0 && n < CHAR_BIT * sizeof(int) - 1, and n is an integer constant expression, then (unsigned char )(1 << (n)) is an integer constant expression because: its operands are integer constants, it doesn't use one of the operators which are forbidden in constant expressions (e.g. ++), and the cast is allowed because it casts from integer type to another integer type.

    If n is outside this range then the expression is not a constant expression, and causes undefined behaviour.

    Your other expression is similar; it is an ICE so long as CHAR_BIT - n falls in that same valid range.