Compute capability of a small (1mm^2) ASIC

I was watching a recent ACM Turing Lecture by Hennessy and Patterson and was intrigued by a stat they cited on the cost of small chip tape-outs. They claimed that you can tape-out 100 1 mm x 1mm chips at 28 nm process node for $14,000, presumably on a test shuttle.

My question is, if I wanted to fill this chip area with MAC units (say 16 or 32 bit), how many simultaneous MACs could I do per cycle?

Solution

Just as a back of the envelope calculation, this paper describes a 32x32->64 multiplier as being 435um*482um in Synopsys' 90nm educational technology. If you just trivially scale to 28nm, you get 0.02mm^2 per instance. That's probably within an order of magnitude, which is good enough because "multipliers per mm" isn't really a meaningful metric: the interesting part is how to get data into and and out of such a multiplier array, which will dominate the area of the actual multipliers.

For another reference, the FU540-C000 is 30mm^2 in TSMC's 28nm HPC process. Yunsup's HotChips presentation from last year shows a fairly detailed die plot on page 17, from which you can calculate what 1mm^2 gets you on a modern technology -- it's quite a bit of SRAM/logic, but not many pads.