Search code examples
c++gccoptimizationlpc

How is compiler optimization speeding up the time between simple operations?


I have an embedded C++ project that is using SPI. When I compile and run my program without optimizations (-O0), the peripheral device (an LCD panel) works fine. When I compile it with optimizations (-O1), the peripheral device does not work correctly.

I checked both cases with a logic analyzer, and the only difference is that the time between the bytes being written is much shorter with the optimized code (clock rate, bytes written etc. are identical). How is compiler optimization able to affect the time between subsequent operations that are simply writing to a hardware register one after the other? If I add a delay after each write command in the SPI class, it works in the optimized case.

Unlike the sample below, in the original code, the calls to WriteCommand() and WriteData() are through a pointer.

Code snippet, consecutive writes to peripheral over SPI:

{
    SPI m_spiPort();
    m_spiPort.Init();

    m_spiPort.WriteCommand(SLEEPOUT);

    // Color Interface Pixel Format (command 0x3A)
    m_spiPort.WriteCommand(COLMOD);
    m_spiPort.WriteData(0x03); // 0x03 = 12 bits-per-pixel

    // Memory access controller (command 0x36)
    m_spiPort.WriteCommand(MADCTL);
    m_spiPort.WriteData(0x00);

    // Write contrast (command 0x25)
    m_spiPort.WriteCommand(SETCON);
    m_spiPort.WriteData(0x39); // contrast 0x30

    // Display On (command 0x29)
    m_spiPort.WriteCommand(DISPON);

}

The SPI class:

class SPI {
public:
    void Init();
    void WriteCommand(unsigned int command);
    void WriteData(unsigned int data);
private:
    void Write(unsigned int value);
};

The implementation of this class is:

void SPI::WriteCommand(unsigned int command)
{
    command &= ~0x100;   //clear bit 8
    Write(command);
}

void SPI::WriteData(unsigned int data)
{
    data |= 0x100;   //set bit 8
    Write(data);
}

void SPI::Write(unsigned int value)
{
    LPC_SSP->DR = value;
}
void SPI::Init( void )
{
  LPC_SYSCON->SYSAHBCLKCTRL |= (1<<11);  //Enables clock for SPI
  LPC_SYSCON->SSPCLKDIV = 0x01;

  LPC_IOCON->PIO0_14 &= ~(0x7); // SCK
  LPC_IOCON->PIO0_14 |= 0x2;

  LPC_IOCON->PIO0_17 &= ~(0x7); // MOSI
  LPC_IOCON->PIO0_17 |= 0x2;

  /* SSP SSEL is a GPIO pin */
  LPC_IOCON->PIO0_27 = 0x0;     // configure as GPIO pin
  LPC_GPIO0->MASK = (1<<27);
  LPC_GPIO0->DIR |= (1<<27);    // set in output mode */
  LPC_GPIO0->CLR = 1 << 27;

  /* Set DSS data to 9-bit, Frame format SPI, CPOL = 0, CPHA = 0, and SCR is 0 */
  LPC_SSP->CR0 = 0x0008;

  /* SSPCPSR clock prescale register, master mode, minimum divisor is 0x02 */
  LPC_SSP->CPSR = 0x4;  // SPI clock will run at 6 MHz

  /* set Master mode and enable the SPI */
  LPC_SSP->CR1 = 0x2;
}

Edit - removed DelayInCycles() from SPI::Write(). The differences are still apparent without it and I did not mean to include it in this post.


Solution

  • For each command and data byte, your code invokes two functions and both functions don't have local variables or many temporaries.

    When faithfully implemented, each of those functions will create a stack frame (takes a handful of instructions to setup and tear down) to store any local variables and temporaries that can't be held in registers. This is probably what happens in -O0 compilation mode.

    Two important optimizations that can affect the execution time for such code are:

    • Stack frame omission: The compiler notices that the stack frame for Write (and possibly also for WriteCommand and WriteData) is unused and decides to eliminate the instructions for setting up (and tearing down) the stack frame.
    • Function inlining: As Write, WriteCommand and WriteData are all very simpl functions, the compiler can decide to eliminate the function calls altogether and to generate code as if you had written (disregarding any accessibility issues):

      {
          SPI m_spiPort();
          m_spiPort.Init();
      
          m_spiPort.LPC_SSP->DR = (SLEEPOUT & ~0x100);
      
          // Color Interface Pixel Format (command 0x3A)
          m_spiPort.LPC_SSP->DR = (COLMOD & ~0x100);
          m_spiPort.LPC_SSP->DR = (0x03 & 0x100);
      
          // Memory access controller (command 0x36)
          m_spiPort.LPC_SSP->DR = (MADCTL & ~0x100);
          m_spiPort.LPC_SSP->DR = (0x00 & 0x100);
      
          // Write contrast (command 0x25)
          m_spiPort.LPC_SSP->DR = (SETCON & ~0x100);
          m_spiPort.LPC_SSP->DR = (0x39 & 0x100);
      
          // Display On (command 0x29)
          m_spiPort.LPC_SSP->DR = (DISPON & ~0x100);
      }
      

    Both optimizations eliminate a number of (bookkeeping) instructions between the actual writes to the register and make the writes thus come faster after each other.