Generics using Macro expansion: Why the poor use of Instruction cache?

Russ Cox (Go core dev) on his blog on generic implementations mentioned following about the macro expansion(C++) method to implement generics -

The individual specializations may be efficient but the program as a whole can suffer due to poor use of the instruction cache

I have been thinking about this statement for days but fail to understand the reason behind it. Is there any sound reason behind this assumption?

I understand that the instruction cache will not be used well(relatively) if we have -

A poor locality of reference of instructions
A poor branch prediction.

Does macro expansion causes any of these?

Update: Similar discussion in Go Generic Discussion Summary. Here they generalised the technique used in C++ as Type Specialization

Solution

Each instance of a template is effectively a separate class. If you use a std::vector<int> and a std::vector<float> you end up with two complete copies of the vector implementation (most standard libraries try to move as much code as possible out of the vector into a base class or private implementation class for this reason but some duplication remains). With (up to)twice as much code being used the instruction caches will fill twice as fast.