Search code examples
gccarmmemory-alignmentqnxqnx-neutrino

what type of code can trigger unaligned data access sigbus trap dynamically?


I am looking for SIGBUS on unaligned data access. I am tracking one of this errors and I would like to know how this is happening on sitara am335x. Can someone please give me an example code to describe this or ensure triggering it.

Adding code snippet:

int Read( void *value, uint32_t *size, const uint32_t baseAddress )
{
    uint8_t *userDataAddress = (uint8_t *)( baseAddress + sizeof( DBANode ));
    memcpy( value, userDataAddress, ourDataSize );
    *size = ourDataSize;
    return 0;
}

DBA node is a class object of 20 bytes. baseAddress is an mmap to a shared memory file again of a class object type of DBANode casted to a uint32_t so that the arithmetic can be done.

This is the dissasembly of the section:

    91a8:   e51b3010    ldr r3, [fp, #-16]
    91ac:   e5933000    ldr r3, [r3]
    91b0:   e51b0014    ldr r0, [fp, #-20]  ; 0xffffffec
    91b4:   e51b1008    ldr r1, [fp, #-8]
    91b8:   e1a02003    mov r2, r3
    91bc:   ebffe72b    bl  2e70 <memcpy@plt>
    91c0:   e51b3010    ldr r3, [fp, #-16]
    91c4:   e5932000    ldr r2, [r3]
    91c8:   e51b3018    ldr r3, [fp, #-24]  ; 0xffffffe8
    91cc:   e5832000    str r2, [r3]

00002e70 <memcpy@plt>:
    2e70:   e28fc600    add ip, pc, #0, 12
    2e74:   e28cca08    add ip, ip, #8, 20  ; 0x8000
    2e78:   e5bcf868    ldr pc, [ip, #2152]!    ; 0x868

When the exact same code base was re-built, the problem just disappeared. Can the gcc create 2 different versions of instructions with same optimization of -O0 specified for gcc ?

I also diffed the library so files obj dumps in both compilations. They are exactly the same. The api is used quite often. However, the crash only happens after prolonged use over a few days. I am reading the same node every 500ms. So this is not consistent. Should I be looking at pointer corruption ?


Solution

  • Turns out the baseAddress is the issue. As I mentioned its an mmap to an shared memory location where the mmap can fail. failed mmap returns -1 and the code was checking for NULL and proceeding to write to -1 i.e 0xFFFFFFFF causing a sigbus. The code 1 is seen when we use memcpy. Trying any other access like a direct byte addressing gives a code 3 with sigbus.

    I am still not sure why it triggers SIGBUS instead of SIGSEGV. Shouldn't this be a memory violation instead ? Here is an example:

    int main(int argc, char **argv)
    {
        // Shared memory example                                                    
         const char *NAME = "SharedMemory";                                          
         const int SIZE = 10 * sizeof(uint8_t);                                      
         uint8_t src[]={0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0x00};          
         int shm_fd = -1;                                                            
    
         shm_fd = shm_open(NAME, O_CREAT | O_RDONLY, 0666);                          
         ftruncate(shm_fd, SIZE);                                                    
    
        // Map shared memory segment to address space                               
         uint8_t *ptr = (uint8_t *) mmap(0, SIZE, PROT_READ | PROT_WRITE | _NOCACHE, MAP_SHARED, shm_fd, 0);
         if(ptr == MAP_FAILED)                                                       
         {                                                                           
              std::cerr << "ERROR in mmap()" << std::endl;                            
          //  return -1;                                                              
          }                                                                           
          printf("ptr = 0x%08x\n",ptr);                                               
          std::cout << "Now storing data to mmap() memory" << std::endl;              
          #if 0                                                                           
          ptr[0] = 0x11;                                                              
          ptr[1] = 0x22;                                                              
          ptr[2] = 0x33;                                                              
          ptr[3] = 0x44;                                                              
          ptr[4] = 0x55;                                                              
          ptr[5] = 0x66;                                                              
          ptr[6] = 0x77;                                                              
          ptr[7] = 0x88;                                                              
          ptr[8] = 0x99;                                                              
          ptr[9] = 0x00;                                                              
          #endif                                                                          
    
          memcpy(ptr,src,SIZE);   //causes sigbus code 1                              
          shm_unlink(NAME);
    }
    

    I still do not know why mmap is failing on an shm even though I have a 100MB of RAM available and all my resource limits are set to unlimited with over 400 fds (file descriptors) still available out of 1000 fds limit. !!!