Search code examples
linux-kernelmultiprocessingmipsinterruptsgi

Implementing SMP properly on a Linux/MIPS platform


I have been trying to get SMP support working again on a port of Linux/MIPS kernel to the SGI Octane (IP30) for the last few weeks now. Uniprocessor support works fine, but I am running into a lot of problems working with the second CPU. I can boot the machine to the init process, but that dies with either a SIGSEGV or SIGBUS a majority of the time. I have most of the support code in place from patches written 5+ years ago, but I suspect I am either not locking things properly or I am re-enabling IRQs unexpectedly.


Some background of the hardware:

The MIPS R10000-series CPU implements 8 interrupts, IP0 to IP7:

  • IP0 and IP1: Software interrupts only and are currently not used for much.
  • IP2 to IP6: Generally routed to some other hardware function for handling
  • IP7: The R10K timer/counter/compare interrupt.

  • R10K supports the MIPS-IV ISA, and has both an I-cache and D-cache.

    • I-cache is 32kB, VIPT, 2-way, and 64-byte linesize.
    • D-cache is 32kB, VIPT, 2-way, no aliases, and 32-byte linesize.
  • R10K L2 cache is 2MB, 2-way, and 128-byte linesize.
  • R10K is superscalar, employs speculative execution, and can execute out-of-order.
  • Octane is cache-coherent, thus does not suffer from the effects of speculative execution.
  • Specifically, I have an R14000 dual module in this system. Not much is known about it other than it's mainly an R10K with a die shrink and faster clockspeeds. SGI has never released hardware datasheets on this processor, nor any errata information.


Octane has an ASIC called HEART as both its memory controller and interrupt controller. HEART was designed to support up to 4 processors and has 64 interrupts (IRQs) available. These 64 IRQs are divided into several priority levels and are mapped to the R10K CPU IPx IRQs above:

  • Level 0, IRQs 0 to 15 -> CPU IP2
  • Level 1, IRQs 16 to 31 -> CPU IP3
  • Level 2, IRQs 31 to 49 -> CPU IP4
  • Level 3, IRQ 50 -> CPU IP5
  • Level 4, IRQs 51 to 63 -> CPU IP6


There are some notes about these priority levels:

  • Level 0 and Level 1 IRQs are primarily assigned to devices in the system (SCSI, ethernet, etc).

  • Level 2 has several uses:

    • IRQs 32 to 40 are also available for use by devices in the system (Especially those that need a higher priority).
    • IRQ 41 is hardwired for power button presses.
    • IRQs 42 to 45 are for debugger signals to the 4 possible CPUs.
    • IRQs 46 to 49 are SMP interprocessor interrupts (IPI) for the 4 possible CPUs.

  • Level 3, IRQ 50, is specifically for the counter/compare timer on the HEART itself. It runs at 12.5MHz (80ns, I think). It has a single count register and compare register. From a Linux clockevent standpoint, I think this is a better resolution timer for use as the system timer (52-bit counter, 24-bit compare).

  • Level 4 is for error IRQs:

    • IRQs 51 to 58 are error IRQs for each of the 8 available Xtalk widgets on the XIO Bus (a high-speed bus arranged in a star topology, serviced by the XBOW ASIC).
    • IRQs 59 to 62 are bus error IRQs for the 4 possible CPUs.
    • IRQ 63 is the exception error IRQ for HEART itself.

HEART presents several registers for working with interrupts. Each register is 64-bits wide, one bit-per interrupt:

  • HEART_ISR - Read-only register to get the list of pending interrupts.
  • HEART_SET_ISR - Write-only register to set a specific interrupt bit.
  • HEART_CLR_ISR - Write-only register to clear a specific interrupt bit
  • HEAR_IMR(x) - Read/write register to set or clear the interrupt mask for a specific interrupt on a specific CPU, represented by x.


I use the following code for the basic IRQ ack/mask/unmasking operations

u64 *imr;                       /* Address of the mask register to work on */
static int heart_irq_owner[64]; /* Which CPU owns which IRQ? (global) */

Ack:    writeq((1UL << irq), HEART_CLR_ISR);

Mask:   imr = HEART_IMR(heart_irq_owner[irq]);
        writeq(readq(imr) & (~(1UL << irq)), imr);

Unmask: imr = HEART_IMR(heart_irq_owner[irq]);
        writeq(readq(imr) | (1UL << irq), imr);


These basic operations are implemented using the struct irq_chip accessors within the 3.1x-series Linux kernel, and I protect access to the HEART registers using spin_lock_irqsave and spin_unlock_irqrestore. I am not 100% certain if I should be using those locking functions in these accessors.



For processing all interrupts, the standard Linux/MIPS platform dispatch function takes the following actions:

  • IP7 -> Calls do_IRQ() to handle the CPU timer IRQ.
  • IP6 -> Calls ip30_do_error_irq() to report any HEART errors to syslog.
  • IP5 -> Calls do_IRQ() to handle the clockevent IRQ assigned to the HEART timer.
  • IP4, IP3, and IP2 -> Calls ip30_do_heart_irq() to handle all HEART IRQs from 0 to 49.


This is the code currently used for ip30_do_heart_irq():

static noinline void ip30_do_heart_irq(void)
{
    int irqnum = 49;
    int cpu = smp_processor_id();
    u64 heart_isr = readq(HEART_ISR);
    u64 heart_imr = readq(HEART_IMR(cpu));
    u64 irqs = (heart_isr & 0x0003ffffffffffffULL &
                heart_imr);

    /* Poll all IRQs in decreasing priority order */
    do {
        if (irqs & (1UL << irqnum))
            do_IRQ(irqnum);
        irqnum--;
    } while (likely(irqnum >= 0));
}


When it comes to SMP support, unlike other Linux/MIPS platforms, I do not have something akin to a mailbox register in the hardware to store what kind of IPI action should be taken. The original code uses a global int array (ip30_ipi_mailbox), indexed by the CPUID, for specifying what IPI action to pass on to the other processor.

Additionally, even though HEART was designed to support up to 4 processors, SGI only ever produced a dual CPU module. Therefore, IRQs 44-45, 48-49, and 61-62 are never actually used for anything.

Given these global variables:

#define IPI_CPU(x) (46 + (x))
static DEFINE_SPINLOCK(ip30_ipi_lock);
static u32 ip30_ipi_mailbox[4];


This is the code currently used to send an IPI to the other CPUs:

static void ip30_send_ipi_single(int cpu, u32 action)
{
    unsigned long flags;

    spin_lock_irqsave(&ip30_ipi_lock, flags);
    ip30_ipi_mailbox[cpu] |= action;
    spin_unlock_irqrestore(&ip30_ipi_lock, flags);
    writeq(1UL << IPI_CPU(cpu)), HEART_SET_ISR);
}


To respond to an IPI, each CPU calls request_irq in its initialization code and registers an interrupt handler. This is the code currently used in the handler to service the IPI interrupt:

static irqreturn_t ip30_ipi_irq(int irq, void *dev_id)
{
    u32 action;
    int cpu = smp_processor_id();
    unsigned long flags;

    spin_lock_irqsave(&ip30_ipi_lock, flags);
    action = ip30_ipi_mailbox[cpu];
    ip30_ipi_mailbox[cpu] = 0;
    spin_unlock_irqrestore(&ip30_ipi_lock, flags);

    if (action & SMP_RESCHEDULE_YOURSELF)
        scheduler_ipi();

    if (action & SMP_CALL_FUNCTION)
        smp_call_function_interrupt();

    return IRQ_HANDLED;
}



And that's the background info.

My current kernel configuration has everything except the framebuffer and the "Impact" video driver stripped out. No PCI, no block layer, no networking, no serial, no keyboard/mouse. I have a ~7 year old initramfs I am loading up that, if everything works, should drop to a bash prompt. However, because it loads into RAM, it's capable of exposing memory corruption rather quickly, and I either get the aforementioned SIGSEGV or SIGBUS errors as a result.

Using remote GDB or the built-in KGDB is not an option at present because of the IOC3 PCI device. IOC3 is a multifunction PCI device that claims to be a single function device and behind it lie the hardware bits for the keyboard/mouse, serial ports, real-time clock, and the ethernet. Code does not exist yet to get around the IOC3 and access the serial ports directly for remote GDB, and KGDB doesn't know how to talk to the standard i8042 keyboard controller on the IOC3, either.

I have a standard PCI serial card added (Moschip-based), but that driver is apparently not endian safe, thus probing for serial ports panics the kernel.


Getting the following questions answered will, I hope, put me on the right path to getting SMP working by allowing me to better identify the faulty code and focus on making it work right:

  • Am I using spinlocks correctly?
  • Am I using the correct spinlock variants?
  • Do I need to add synchronization calls anywhere (i.e., smp_rmb(), smp_wmb(), etc)?
  • Could my problem lie outside of this core platform support code (such as in the video driver)?
  • Could I be looking at an unknown hardware erratum corrupting memory at random?
  • Could any of the above code be implemented better? (much of it is code from the original port of Linux 2.6.17 to the Octane, just updated to be more inline with how other things in the kernel work)

Any information that can put me on the right path to figuring this out would be appreciated. My hope is to get SMP into a usable state (efficiency is irrelevant, I just need it to work), so I can start working on breaking things up into patches and see about getting it included in the mainline kernel at some point. If I can't get SMP to work, I'll just drop its support and focus on getting the uniprocessor code sent upstream instead.


Solution

  • The bug was ultimately worked out to not assigning the IRQ numbers to their correct handler. I was initially assigning ALL 64 IRQs to use handle_level_irq, which is incorrect for SMP interprocessor interrupts (IPIs). The fix turned out to assign the 8 CPU-specific interrupts, 42-45 and 46-49, to handle_percpu_irq instead.