Search code examples
cexpressionvariable-assignmentimplicit-conversion

How are expression values stored before they become assigned in C?


Beginner question here; in C, when you assign a numerical value to a variable, the language makes it clear that the variable has a specific type, which can contain only a specific range of numerical values. If the value one assign to a variable is outside the range defined by its type, it will overflow. Example;

char foo = 1000;

here foo is not expected to be equal to 1000. I suppose assignment involves the expression's value to be casted to the variable type, consequently, assignment would potentially modify the value of the expression depending on whether overflow occurs or not, which depends on the variable type and the expression's value. In more complex cases (not hardcoded values), I suppose that the value which will be stored in memory is not known before the assignment actually happen.

My question is; during code execution, how are the values of expressions stored before they are assigned?

Indeed, they have to be written on a certain number of bits, which gives them a minimum, a maximum, and a precision maximum in the case of floating point values.

As an application of this question; is it possible to write an expression involving a hardcoded number so big that it actually cannot be stored properly during execution? Imagine a hardcoded expression involving a number a little larger than the maximum possible value, divided by two, such that the expression would theoretically produce a representable number, but it doesn't.


Solution

  • Ordinarily, each expression in C, including a subexpression inside another expression, has a type. For identifiers, the type is declared. For constants or literals, the type is a consequence of the form and value of the constant or literal. For the results of operators, the type is determined by the types of the operands and rules for the operator.

    For example, for integer constants, there is a table in C 2018 6.4.4.1 5. For decimal constants without a suffix (like l for long), it says the type is the first of int, long int, and long long int that can represent the value. The following paragraph also says that, if the value does not fit in any of those, it can fit in some extended integer type provided by the C implementation. It also says if the value cannot be represented by any type in its list, and it “has no type.” If a constant has no type, then the program violates the constraint in the constraints clause 6.4.4 2, which says “Each constant shall have a type…,” and the behavior is not defined by the C standard. When a program violates an constraint listed in a constraints clause, the compiler must produce a diagnostic message for this.

    For many operators, the rules say that integer operands are promoted to a width of at least int. (There are some technicalities I am omitting here, but this is the primary effect of the integer promotions.) This means you cannot do arithmetic on just a char or short value where in a C implementation in which those types are narrower than an int. Further, for many operators with two operands, the operands are converted to some common type, generally the “bigger” type (although again there are some technicalities, which can be more troublesome due to conversions between signed and unsigned types).

    All of these rules about types affect the values that will result from evaluating expressions. If you add some value of type X to some value of type Y and multiply by some value of type Z, the rules will be applied to determine what value results. But the rules do not say how the value must be represented while the program is working with the expression. A compiler may generate code that processes values in registers, that keeps some constants in immediate fields in instructions, that builds some constants on the fly during program execution, that does not contain actual instructions that perform the explicit operations because the compiler optimized the expression to a different form, and more. The C standard only requires that a value be represented in a certain way when it is stored in an object (memory reserved to hold a value, as with a definition of a variable). (And even that can be removed by optimization, as long as the program behaves the same way with regard to the observable effects as defined by the C standard.)