Let's say I have a CPU with 32 cores and a huge 120 MB L3 cache. If I run some memory-heavy code which executes on only one core, can that single core benefit from the whole L3 cache? As far as I know L3 is shared between cores in most of the modern x86 CPUs...
So I'd say yes, it benefits from it, but I am not sure...this would imply that having many core CPUs with huge L3 caches would in fact speed up single core execution for some memory heavy workloads.
On an AMD Zen, no, each CCX (core cluster) of 4 cores has its own private L3 that's independent of L3 in other CCXs.
On an Intel CPU, yes. L3 is shared by all cores in a socket/package. Having a lot of cores each with their own slice of L3 will mean a larger ring-bus or mesh, and higher latency for L3, but better capacity.
If 8MiB of L3 was enough for most of the accesses from some single-threaded program, it would probably run faster on a quad-core "client" i7 chip than a big Xeon with 32 cores, assuming both ran at the same clock speed. related: Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?