Search code examples
cwhile-loopc-stringsfunction-definitionstrstr

heap-buffer-overflow for Leetcode strStr problem


The problem is for LeetCode's Implement strStr. I have a working solution when I compile and execute in my computer, it works fine, but when I try to run it with LeetCode it gives me some weird error. I don't have any idea what is the problem.

This is my solution:

#include <stdio.h>
#include <stdlib.h>

int strStr(char *haystack, char *needle) {
    if (needle[0] == '\0')
        return 0;

    int i = 0;
    int j = 0;

    while (haystack[i] != '\0') {
        while (haystack[i] == needle[j] && haystack[i] != '\0' && needle[j] != '\0') {
            i++;
            j++;
        }

        if (needle[j] == '\0') {
            return i - j;
        } else {
            j = 0;
        }

        i++;
    }
    return -1;
}

int main() {

    printf("%d\n", strStr("aaa", "aaaa"));

    return 0;
}

and this is the error I am getting in LeetCode

=================================================================
==32==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014 at pc 0x5620a8c4472e bp 0x7fff98a004c0 sp 0x7fff98a004b0
READ of size 1 at 0x602000000014 thread T0
    #2 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
allocated by thread T0 here:
    #0 0x7fdce3b25bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #3 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==32==ABORTING

Please explain why this is occurring?


Solution

  • The problem is you increment i inside the inner loop and again in the outer loop, potentially skipping the null terminator, hence accessing bytes in haystack beyond the end, which has undefined behavior.

    You should only increment j in the inner loop and compare haystack[i + j] == needle[j].

    Here is a modified version:

    #include <stdio.h>
    
    int strStr(const char *haystack, const char *needle) {
        if (needle[0] == '\0')
            return 0;
    
        int i = 0;
    
        while (haystack[i] != '\0') {
            int j = 0;
            while (needle[j] != '\0' && haystack[i + j] == needle[j]) {
                j++;
            }
            if (needle[j] == '\0') {
                return i;
            }
            i++;
        }
        return -1;
    }
    
    int main() {
        printf("%d\n", strStr("aaaaaaaaab", "aaaab"));
        printf("%d\n", strStr("aaaaaaaa", "aaaaaaaaa"));
        printf("%d\n", strStr("Hello world\n", "or"));
        return 0;
    }
    

    Note that you can remove some redundant comparisons by reorganising the code:

    int strStr(const char *haystack, const char *needle) {
        for (int i = 0;; i++) {
            for (int j = 0;; j++) {
                if (needle[j] == '\0')
                    return i;
                if (haystack[i + j] != needle[j])
                    break;
            }
            if (haystack[i] == '\0')
                return -1;
        }
    }
    

    Note however that this method has a worst case time complexity of O(len(haystack)*len(needle)), with a pathological example of:

    strStr("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaab")