Memory reading time

I heard that reading one byte from non-cached memory can take up to 50 CPU cycles.

So, does reading int from memory take 4 times as long as reading char, meaning up to 200 cycles?

If not, is it a good idea to get 4-8 chars at a time with *(size_t *)str, saving 150-350 CPU cycles? I imagine endianness might become an issue in that case.

Also, what about local variables used in a function? Do they all get written into registry, or get inlined if possible, or at least get cached in L1?

Solution

I heard that reading one byte from non-cached memory can take up to 50 CPU cycles.

This would be a characteristic of a specific CPU, but in a general sense, yes, reading from non-cached memory is costly.

So, does reading int from memory take 4 times as long as reading char, meaning up to 200 cycles?

You seem to be assuming that int is 4 bytes wide, which is not necessarily the case. But no, reading four consecutive bytes from memory does not typically take four times as many cycles as reading one. Memory is ordinarily read in blocks larger than one byte -- at least the machine word size, which is probably 4 on a machine with 4-byte ints -- and that also typically causes a whole cache line worth of memory around the requested location to be loaded into cache, so that subsequent access to nearby locations is faster.

If not, is it a good idea to get 4-8 chars at a time with *(size_t *)str, saving 150-350 CPU cycles? I imagine endianness might become an issue in that case.

No, it is not a good idea. Modern compilers are very good at generating fast machine code, and modern CPUs and memory controllers are designed with typical program behaviors in mind. The kind of micro-optimization you describe is unlikely to help performance, and it might even hurt by inhibiting optimizations that your compiler would perform for code written more straightforwardly.

Moreover, your particular proposed expression has undefined behavior in C if str points (in)to an array of char instead of to a bona fide size_t, as I take to be your intention. It might nevertheless produce the result you expect, but it might also do any number of things you wouldn't like, such as crash the program.

Also, what about local variables used in a function? Do they all get written into registry, or get inlined if possible, or at least get cached in L1?

Again, your compiler is very good at generating machine code. Exactly what it does with local variables is compiler-specific, however, and probably varies a bit from function to function. It is in any case outside your control. One writes in C instead of assembly because one wants to leave such considerations to the compiler.

In general:

Write clear code using good algorithms.
Rely on your compiler to optimize it.
If the result is not fast enough then profile it to find the slowest parts, and work on those.