Search code examples
csegmentation-faultprintfc-stringsimplicit-conversion

C pointer concept segmentation fault


Please take a look of snippet: I am not able to understand why code is giving segmentation fault and garbage characters.

#include <stdio.h>

int main()
{
    char *str[1];
    str[0] = "apple";

    char **ptr = str;
    char *p = str;

    printf("%d\n", ptr);  //---> 1956347328
    printf("%d\n", p);      //---> 1956347328

    printf("%s \n", *ptr, ); // ---> prints apple
    printf("%s\n", *p);    // ---> error segmentation fault
    printf("%s\n", p);    // ---> Garbage characters    
}

Solution

  • Here is a detailed explanation of the posted code behavior and problems:

    • char *str[1]; defines an array of pointers to char with a single element.

    • str[0] = "apple"; might trigger a warning as you store a pointer to a string literal, which must not be modified, into an element of the array which contains pointer to modifiable data. As long as you do not use this pointer to actually try and modify the string, you are safe.

    • char **ptr = str; defines another object, a pointer to a pointer to char, intialized to point to the array str. No problem with that, but you can only access the element at ptr[0].

    • char *p = str; defined another object, a pointer to char initialized to point to the array str. This is a type mismatch as the type of str is neither array of char nor pointer to char, hence the compiler should issue a warning. Yet as long as you only use this pointer to access the bytes that compose this array, you are safe. The array contains a single pointer (that points to the "apple" string), which on your 64-bit machine is composed of 8 bytes.

    • printf("%d\n", ptr) has undefined behavior because printf expects an argument of type int for the %d format, and you pass a char ** value. On your architecture, integers and pointers seem to be passed in the same register or stack area, so printf outputs an integer that is a part (the low 32 bits) of the pointer, 1956347328. You should instead use printf("%p\n", (void *)ptr);

    • printf("%d\n", p) same remark as above, the output is the same because both pointers point to the same address, but as they have different types, dereferencing them produces different things.

    • printf("%s \n", *ptr, ) is a syntax error, but without the extra , printf receives *ptr, which is the first element of the array str, ie: the char * pointer to the string "apple", which is what it expects for %s. printf outputs the string apple, a space and a newline.

    • printf("%s\n", *p): printf receives *p, which is the char at the beginning of the str array. This byte is the first byte of the pointer to the string, which has one of 256 possible values for such a pointer, in the range CHAR_MIN to CHAR_MAX (0..255 or -128..127 depending on the signedness of type char). printf expects a pointer to char as the argument for the %s conversion. Passing the char value has undefined behavior, and the value used by printf in your case is an invalid pointer: dereferencing it causes a segmentation fault.

    • printf("%s\n", p): the pointer p is passed to printf for the %s conversion, which is fine, but printf reads and outputs the bytes it points to until it reads a null byte. These bytes are not those of the string apple, they are the bytes that compose the pointer stored in str[0], which look like garbage values when output as characters. If printf did not find a null byte as it reads them, it would invoke undefined behavior whren reading past the end of the array and possibly cause a segmentation fault too.

    Consider this modified version:

    #include <stdio.h>
    
    void dump(const char *s, void *addr, size_t len) {
        printf("%s is at address %p, contains:", s, addr);
        unsigned char *p = addr;
        for (size_t i = 0; i < len; i++)
            printf(" %02x", p[i]);
        printf("\n");
    }
    
    int main(void) {
        char *str[1] = { (char *)"apple" };
        char **ptr = str;
        char *p = (char *)str;
    
        dump("str ", &str, sizeof(str));
        dump("ptr ", &ptr, sizeof(ptr));
        dump("p   ", &p, sizeof(p));
    
        dump("*ptr", *ptr, sizeof("apple"));
    
        printf("*ptr as string: %s\n", *ptr);
        printf("*p   as byte  : %02x\n", *p);
    
        return 0;
    }
    

    I get this output on my macbook:

    str  is at address 0x30cb5f278, contains: 6b ef 3d 04 01 00 00 00
    ptr  is at address 0x30cb5f270, contains: 78 f2 b5 0c 03 00 00 00
    p    is at address 0x30cb5f268, contains: 78 f2 b5 0c 03 00 00 00
    *ptr is at address 0x1043def6b, contains: 61 70 70 6c 65 00
    *ptr as string: apple
    *p   as byte  : 6b
    

    If I run the same executable again, I get different output because of address space randomisation, a technique used to make it more difficult for hackers to exploit software vulnerabilities.

    The output shows that:

    • the 3 objects p, ptr and str have a size of 8 bytes.

    • they are located in memory in adjacent places (in stack, below the main arguments).

    • p and ptr point to the area occupied by str: the address 0x3085b8280 is stored in little endian order, ie: from the least significant byte to the most significant byte. This looks like reverse order but the order in which to store the bytes of a pointer (or integer) is only a matter of convention, just like the order of parts of speech in a phrase vary from one language to another.

    • the string "apple" which contains the bytes 61 70 70 6c 65 00 is located at address 0x1023dff58 (a constant data segment).

    • *p has the value 6b which is a k, but, on my laptop, this value varies from one run to another as explained here above.