x86 intel cpu-architecture boot multicore

How is the bootstrap processor (BSP) selected on Intel ring and mesh architectures

Section 2.13.2 mentions that the arbitration ID is used to determine which processor issues the no-op cycle first and I have seen this on multiple sources and the intel manual. The intel manual that references the MP initialisation sequence only addresses Pentium 4 when when there was a 'system bus' and before that there was originally an 'APIC bus'. I am under the impression that arbitration ID was only needed in those architectures where multiple cpus shared the same bus. But now, with the ring bus architecture, arbitration is done by sensing an empty slot on the ring bus and placing the transaction on it and it moves round at one stop per cycle meaning arbitration is no longer required.

What's interesting is Section 2.13.2 is part of a document that speaks about Intel ME and the PCH, so it is obviously speaking about Nehalem and recent but to say that the APIC ArbID is used, perhaps it is indeed only talking about Nehalem or Westmere.

So I ask, how is the BSP selected on ring and indeed mesh architectures? My thought was that it could use cache as RAM and if cache coherency does function in no fill mode then they could race for a mutex

Solution

I assume it's just hard-wired that one of the cores is the BSP. I don't think they other cores even power up until you send them an IPI, and they certainly wouldn't be running code that tries to take a mutex in cache to sort this out. The other cores probably come up in a HALT-like state that waits for an interrupt.

(But probably a deep sleep C-state like C7 or something, unlike the actual HALT instruction, so if the OS never wakes up some of the cores, putting the woken cores to sleep can let the whole package go into a deep sleep state.)

For multi-socket systems, presumably one socket is special somehow.