I want to program (as efficiently as possible) a TCP/IP communication stack in C or C++. It really must run as fast as possible.
Does anyone have a good example or suggestion of where to start?
As Steve points out in the comments you do need quite a bit of experience to do this well. So rather than jumping directly to your end goal I recommend these possible steps:
Linux is a good option as the details you need are easily accessible and documented.
And oh yeah, stop as soon as you realize you won't likely outperform the Linux kernel.