Does type-casting each element of an array takes less space than copying the array into a new one?

I am currently writing a program where given two input matrices (int8_t and float respectively) I compute the multiplication of the two.

For memory reasons, I do not want the entire int8 matrix to be converted to a floating type (or any type that occupies more than 8bits in memory). I believe that in C, when multiplying int with float, there is an implicit type casting that is done to convert int to float, in order to do the operation.

My question now is, if I cast my int8 input as a float, then do the computation, what is actually happening in the memory ? Is it over-writing other useless memory spaces when the cast is done, or is it taking additional place as if I created a float array in which I copied my data ?

for (int i = 0; i < n; ++i) {
    for (int j = 0; j < m; ++j) {
        float sum = 0;
        for (int l = 0; l < k; ++l) {
            sum += (float) input_a[i*k + l] * input_b[l*m + j];
        }
        output[i*m + j] = sum;
    }
}

Solution

sum += (float) input_a[i*k + l] * input_b[l*m + j]; specifies a computation to be performed. It overtly says to fetch element i*k + l of input_a, convert it to float, fetch element l*m + j of input_b, multiply these (including an implicit conversion of the second operand to float), add them to sum, and store the result in sum.

Nothing in this says to store anything into any memory other than sum. The C standard allows a compiler to implement this computation in any way that does not alter the observed behavior of the program, which consists of its output, its input/output interactions, and accesses to volatile objects. With most compilers and most processors, the compiler will generate code to perform this operation entirely in processor registers:

The subscripts will be calculated in processor registers.
The array elements will be loaded into processor registers.
The values will be converted to float in processor registers.
The multiplication and addition will be performed in processor registers.
The result will either be stored to sum in memory or compiler optimization will keep sum in a processor registers until the final output[i*m + j] = sum; is performed.

In somewhat unusual, yet not extraordinary circumstances, the compiler may use additional memory:

Depending on availability of processor registers and the context of surrounding code, the compiler may have to save some processor registers to the hardware stack to free up the registers for the computations required in this loop. This will be at most a few stores before the loops and a few loads afterward.
In some processors with strict separations between integer and floating-point registers, the compiler may need to use memory to transfer data between them.

In no ordinary C implementation would the compiler generate an entire array of float elements to hold the various values that this code converts to float.