Search code examples
cgdbvalgrindsegmentation-faultsigbus

Can invalid Read/Write cause SIGBUS Error?


EDIT 1: Platform is x86_64 for the sample program.

EDIT 2: I am editing this for better understanding. Below are two different questions. First question is can invalid read/write cause SIGBUS? and second question is Will Valgrind be useful for SIGBUS Analysis?. Sample code is for the second question to support my opinion that is Valgrind will not be useful at all in case of SIGBUS error. I might be wrong here.

Actual Scenario: We have a screen reader app which is crashing after 2 days of continuous testing (once crash due to SIGBUS). I have a coredump file but i don't have the right binary and debug packages. So essentially i have to test this in a different binary and coredump is not working properly in gdb due to mismatch in debug packages. I can see some invalid read/write in screen-reader module during Valgrind analysis. My teammate suggested that by fixing these invalid read/write will resolve this problem but i think it will not fix it. Below is my understanding of both the signals.

SIGSEGV: Address is valid but read/write permissions are not there.

SIGBUS: Address itself is invalid(CPU not able to find address due to mis aligment etc.)

I have a question related to SIGBUS signal. I have searched similar questions on stack overflow but didn't find any clear answer to this question.

Can Invalid read/write cause bus error(SIGBUS)?.

My understanding is that Invalid Read/Write will always cause Segmentation Fault (SIGSEGV) and best way to fix bus error is by running gdb on application. Valgrind analysis in case of bus error will not be helpful at all. Below code explains that in more detail.

#include<stdlib.h>
#include<stdio.h>

typedef struct {
char *name;
int val;
}data;

void fun1()
{
    data *ptr = malloc(sizeof(data));
    ptr->val = 100;
    ptr->name = "name in structure";

    printf("val:%d name:%s\n",ptr->val,ptr->name);
    free(ptr);
    ptr = NULL;
    printf("val:%d name:%s\n",ptr->val,ptr->name); //SIGSEGV
    return;
}

int fun2()
{
    #if defined(__GNUC__) 
    # if defined(__i386__) 
    /* Enable Alignment Checking on x86 */
    __asm__("pushf\norl $0x40000,(%esp)\npopf"); 
    # elif defined(__x86_64__)  
    /* Enable Alignment Checking on x86_64 */
    __asm__("pushf\norl $0x40000,(%rsp)\npopf"); 
    # endif 
    #endif 

    char *cptr = malloc(sizeof(int) + 1);
    char *optr = cptr;
    int *iptr = (int *) ++cptr; 
    *iptr = 42; //SIGBUS
    free(optr);

    return 0; 
}

void fun()
{
    fun2();
    //fun1();
}

int main()
{
    fun();
    return 0;
}

In case of Segmentation fault, Valgrind report will have the details about the code which is causing crash but in case of SIGBUS crash, I didn't find any such details in Valgrind report.

Valgrind report for SIGSEGV:

==28128== Memcheck, a memory error detector
==28128== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28128== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28128== Command: ./a.out
==28128== Parent PID: 27953
==28128== 
==28128== Invalid read of size 8
==28128==    at 0x400619: fun1 (tmp.c:18)
==28128==    by 0x400695: fun (tmp.c:46)
==28128==    by 0x4006A6: main (tmp.c:51)
==28128==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==28128== 
==28128== 
==28128== Process terminating with default action of signal 11 (SIGSEGV)
==28128==  Access not within mapped region at address 0x0
==28128==    at 0x400619: fun1 (tmp.c:18)
==28128==    by 0x400695: fun (tmp.c:46)
==28128==    by 0x4006A6: main (tmp.c:51)
==28128==  If you believe this happened as a result of a stack
==28128==  overflow in your program's main thread (unlikely but
==28128==  possible), you can try to increase the size of the
==28128==  main thread stack using the --main-stacksize= flag.
==28128==  The main thread stack size used in this run was 8388608.
==28128== 
==28128== HEAP SUMMARY:
==28128==     in use at exit: 0 bytes in 0 blocks
==28128==   total heap usage: 2 allocs, 2 frees, 1,040 bytes allocated
==28128== 
==28128== All heap blocks were freed -- no leaks are possible
==28128== 
==28128== For counts of detected and suppressed errors, rerun with: -v
==28128== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Valgrind report for SIGBUS:

==28176== Memcheck, a memory error detector
==28176== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28176== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28176== Command: ./a.out
==28176== Parent PID: 27953
==28176== 
==28176== 
==28176== HEAP SUMMARY:
==28176==     in use at exit: 0 bytes in 0 blocks
==28176==   total heap usage: 1 allocs, 1 frees, 5 bytes allocated
==28176== 
==28176== All heap blocks were freed -- no leaks are possible
==28176== 
==28176== For counts of detected and suppressed errors, rerun with: -v
==28176== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Solution

  • int *iptr = (int *) ++cptr; 
    *iptr = 42; //SIGBUS
    

    violates multiple parts of the C standard.

    You're running afoul of 6.3.2.3 Pointers, paragraph 7:

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

    as well as violating the strict-aliasing rule of 6.5 Expressions, paragraph 7:

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

    • a type compatible with the effective type of the object,
    • a qualified version of a type compatible with the effective type of the object,
    • a type that is the signed or unsigned type corresponding to the effective type of the object,
    • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    • a character type.

    Per the Valgrind documentation for Memcheck:

    4.1. Overview

    Memcheck is a memory error detector. It can detect the following problems that are common in C and C++ programs.

    • Accessing memory you shouldn't, e.g. overrunning and underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.

    • Using undefined values, i.e. values that have not been initialised, or that have been derived from other undefined values.

    • Incorrect freeing of heap memory, such as double-freeing heap blocks, or mismatched use of malloc/new/new[] versus free/delete/delete[]

    • Overlapping src and dst pointers in memcpy and related functions.

    • Passing a fishy (presumably negative) value to the size parameter of a memory allocation function.

    • Memory leaks.

    Note that your code

    int *iptr = (int *) ++cptr; 
    *iptr = 42; //SIGBUS
    

    does none of the things Valgrind claims to detect. You're not accessing memory you don't have permission to access, nor are you accessing memory outside the bounds of the region you created with malloc(). You haven't free()'d the memory yet. You have no uninitialized variables, you're not double-free()ing memory, nor are you using memcpy() improperly with overlapping source and destination regions. And you're not passing negative/"fishy" sizes to allocation functions. And you're not leaking any memory.

    So, no, Valgrind doesn't even claim to be able to detect code that will cause a SIGBUS.