Search code examples
cstack-overflowfirmware

Does this sound like a stack overflow?


I think I might be having a stack overflow problem or something similar in my embedded firmware code. I am a new programmer and have never dealt with a SO so I'm not sure if that is what's happening or not.

The firmware controls a device with a wheel that has magnets evenly spaced around it and the board has a hall effect sensor that senses when magnet is over it. My firmware operates the stepper and also count steps while monitoring the magnet sensor in order to detect if the wheel has stalled.

I am using a timer interrupt on my chip (8 bit, 8057 acrh.) to set output ports to control the motor and for the stall detection. The stall detection code looks like this...

    //   Enter ISR
    //   Change the ports to the appropriate value for the next step
    //    ...

    StallDetector++;      // Increment the stall detector

    if(PosSensor != LastPosMagState)
    {
        StallDetector = 0;

        LastPosMagState = PosSensor;
    }
    else
    {
        if (PosSensor == ON) 
        {
            if (StallDetector > (MagnetSize + 10))
            {
                HandleStallEvent();
            }
        }
        else if (PosSensor == OFF) 
        {
            if (StallDetector > (GapSize + 10))
            {
                HandleStallEvent();
            }
        }
    }

this code is called every time the ISR is triggered. PosSensor is the magnet sensor. MagnetSize is the number of stepper steps that it takes to get through the magnet field. GapSize is the number of steps between two magnets. So I want to detect if the wheel gets stuck either with the sensor over a magnet or not over a magnet.

This works great for a long time but then after a while the first stall event will occur because 'StallDetector > (MagnetSize + 10)' but when I look at the value of StallDetector it is always around 220! This doesn't make sense because MagnetSize is always around 35. So the stall event should have been triggered at like 46 but somehow it got all the way up to 220? And I don't set the value of stall detector anywhere else in my code.

Do you have any advice on how I can track down the root of this problem?

The ISR looks like this

void Timer3_ISR(void) interrupt 14
{
    OperateStepper();  // This is the function shown above
    TMR3CN &= ~0x80;   // Clear Timer3 interrupt flag        
}

HandleStallEvent just sets a few variable back to their default values so that it can attempt another move...

#pragma save
#pragma nooverlay
void HandleStallEvent()
{
///*
    PulseMotor = 0;                 //Stop the wheel from moving
    SetMotorPower(0);               //Set motor power low
    MotorSpeed = LOW_SPEED;
    SetSpeedHz();
    ERROR_STATE = 2;
    DEVICE_IS_HOMED = FALSE;
    DEVICE_IS_HOMING = FALSE;
    DEVICE_IS_MOVING = FALSE;
    HOMING_STATE = 0;
    MOVING_STATE = 0;
    CURRENT_POSITION = 0;
    StallDetector = 0;
    return;
//*/
}
#pragma restore

Solution

  • Is PosSensor volatile? That is, do you update PosSensor somewhere, or is it directly reading a GPIO?

    I assume GapSize is rather large (> 220?) It sounds to me like you might have a race condition.

    // PosSensor == OFF, LastPosMagState == OFF
        if(PosSensor != LastPosMagState)
        {
            StallDetector = 0;
    
            LastPosMagState = PosSensor;
        }
        else
        {
    // Race Condition: PosSensor turns ON here
    // while LastPosMagState still == OFF
            if (PosSensor == ON) 
            {
                if (StallDetector > (MagnetSize + 10))
                {
                    HandleStallEvent();
                }
            }
            else if (PosSensor == OFF) 
            {
                if (StallDetector > (GapSize + 10))
                {
                    HandleStallEvent();
                }
            }
        }
    

    You should cache the value of PosSensor once, right after doing StallDetector++, so that in the event PosSensor changes during your code, you don't start testing the new value.