Search code examples
cachingcpu-architecturecpu-cache

Relation between computer architecture and cache block size


Suppose memory is byte addressable and cache block size is 4 bytes . So in one cache access 1 block is accessed. Does it means computer architecture is of 32 bit. My question is what derivation you can make about computer architecture if you are given about cache block size


Solution

  • No, usually cache block size is larger than the register width, to take advantage of spatial locality between nearby full-register-width loads / stores which is typical. Making cache as fine-grained a 4-byte chunks costs a large amount of overhead (tags and so on) compared to the amount of storage needed for the actual data. e.g. 20 tag bits, plus "dirty" and other MESI state per 32-bit cache line, might mean that a 32 kiB (usable space) cache needs more like 56 kiB of raw SRAM storage, and that's without considering ECC or parity.

    If a CPU has a floating-point unit, it can often do 64-bit loads/stores, even if the integer register width is only 32-bit. (Or even wider with SIMD, or load-pair / store-pair instructions.)

    Typical real-world cache sizes are 64 bytes on modern systems, and formerly 32 bytes on earlier CPUs like Pentium III. 64 bytes is the DDR SDRAM burst size, so it's a good choice for the size of off-chip memory accesses. (Recent Intel systems with AVX-512 SIMD can load/store a whole 64-byte (512-bit) cache line with a single instruction, though. SIMD vector width has caught up to cache line size. But integer accesses are still at most 8 bytes wide.)

    There's no relationship between cache block size and architecture bitness. You definitely want the block size to be at least as wide as a normal load / store, but it would be possible to build a 64-bit machine with 32-bit cache blocks. That would mean 64-bit loads take two cache accesses to do it, so it would be a really bad idea unless your usual workload consisted of using 64-bit addresses in registers to access scattered 32-bit values, and you wanted to optimize for that without caring about efficiency of anything else.

    Most 64-bit ISAs can work with 32 or 64-bit data equally efficiently. Some, notably x86-64, don't even have what you'd call a "word size". There's no one native access size that's most efficient on x86-64, and instructions are an unaligned byte stream, not like ISAs with aligned 32-bit instruction words like RISC-V or AArch64.

    So if you knew that the cache block size was 32-bit, it would be a good guess that the register width was at most 32-bit, but could be 8 or 16-bit. (Or 4-bit or possibly even 6-bit or something? With sizes smaller than 32-bit, for historical CPUs it often becomes a question of what one means by bitness: ALU, register, bus, fixed-width instruction? Notice that in earlier parts of the answer, I just talked about register width, not "32-bit CPU".)

    If this was a real commercial design instead of a computer science example, an 8-bit machine would be the most likely; a normal 32-bit machine would use larger cache blocks but you could plausibly imagine finer granularity on a machine that could only load 1 byte at a time. (Of course, being an 8-bit machine doesn't imply that restriction; you could have a load-pair instruction, or FP registers that allow 32-bit or 64-bit loads/stores.)