Search code examples
calgorithmdiff

Why is calling free causing segfault


I am trying to implement the myers diff algorithm in C.

I am getting a segfault when trying to free the memory previously allocated to the path variable, used to store the x values for each iteration of d and k.

int ** MakeMoveSet(Comparison_t * comp) {
  int **moveset, *path;
  int64_t n = comp->file1LineCount, m = comp->file2LineCount;
  int64_t max_tries = n + m;
  int64_t max_index = 2 * max_tries + 1;
  int64_t prev, next;
  int x,y;

  fprintf(stderr, "max_tries=%ju, max_index=%ju\n", max_tries, max_index);
  
  path = malloc(sizeof(int) * max_index); //Array of ints;
  moveset = malloc(sizeof(path) * max_tries); //Array of paths

  path[1] = 0;

  for (int d = 0; d < max_tries; d++) {

    for (int k = -d; k <=d; k+=2) {
      fprintf(stderr, "d=%d,k=%d\n", d, k);
      prev = k - 1;
      next = k + 1;

      // Determine whether to move up or down
      if (k == -d || (k != d && path[prev] < path[next])) {
    x = path[next];
      } else {
    x = path[prev] + 1;
      }

      y = x - k;
      
      // Check diagonals to settle x at deepest point
      while ((x < n && y < m) && (strcmp(comp->file1Lines[x], comp->file2Lines[y]) == 0)) {
    x ++;
    y ++;
      }
      
      path[k] = x;

      fprintf(stderr, "x = %d, y=%d\n", x, y);

      if (x >= n && y >= m) {
        free(path); //Segfault occurs here
        fprintf(stderr, "Bailing\n");
        comp->shortest_edit_path = d;
        fprintf(stderr, "Shortest edit path = %d\n", d);
        return moveset;
      }
    }
  }

  return NULL;
  
}

Why is this occuring when the memory allocated to path was done using malloc, and path is not null when the free call is made?


Solution

  • Not having the supporting code for the structure and code calling this function, I used some artistic license to define the file path comparison structure.

    typedef struct cmp
    {
        int64_t file1LineCount;
        char file1Lines[20][100];
        int64_t file2LineCount;
        char file2Lines[20][100];
        int64_t shortest_edit_path;
    } Comparison_t;
    

    Then, built a very simple function call in the "main" function.

    int main()
    {
        Comparison_t * work = malloc(sizeof(Comparison_t));
    
        work->file1LineCount = 2;
        strcpy(work->file1Lines[0], "C_Programs/Console/FileCompare/bin/Release");
        work->file2LineCount = 2;
        strcpy(work->file2Lines[0], "C_Programs/Console/FileCompare/bin/Test");
        work->shortest_edit_path = 55;
    
        int ** test = MakeMoveSet(work);
    
        printf("Address of test: %p\n", test);
    
        return 0;
    }
    

    And finally, added a simple "printf" statement that focuses in on the likely bug as noted in the good comments about possibly an assignment of data into an array that was out of bounds.

                path[k] = x;
    
                printf("k value: %d\n", k);                     /* Visual debug print statement */
    
                fprintf(stderr, "x = %d, y=%d\n", x, y);
    
                if (x >= n && y >= m)
                {
                    free(path); //Segfault occurs here
                    fprintf(stderr, "Bailing\n");
                    comp->shortest_edit_path = d;
                    fprintf(stderr, "Shortest edit path = %d\n", d);
                    return moveset;
                }
    

    With that, executing a test of the code revealed that indeed some data assignments into the "path" array were occurring out of bounds (e.g. the value of "k" was a negative value), which subsequently causes undefined behavior.

    craig@Vera:~/C_Programs/Console/FileCompare/bin/Release$ ./FileCompare 
    max_tries=4, max_index=9
    d=0,k=0
    k value: 0
    x = 0, y=0
    d=1,k=-1
    k value: -1
    x = 0, y=1
    d=1,k=1
    k value: 1
    x = 1, y=0
    d=2,k=-2
    k value: -2
    x = 0, y=2
    d=2,k=0
    k value: 0
    x = 2, y=2
    free(): invalid pointer
    Aborted (core dumped)
    

    The simple answer here is that further bounds testing needs to happen so that data assignment outside the boundaries of the assigned memory or array does not occur. As a simple bit of refactoring, the following boundary testing was added.

            if (k >= 0 && k < max_tries)                    /* Boundary testing             */
                path[k] = x;
    
            printf("k value: %d\n", k);                     /* Visual debug print statement */
    
            fprintf(stderr, "x = %d, y=%d\n", x, y);
    
            if (x >= n && y >= m)
            {
                free(path); //Segfault occurs here
                fprintf(stderr, "Bailing\n");
                comp->shortest_edit_path = d;
                fprintf(stderr, "Shortest edit path = %d\n", d);
                return moveset;
            }
    

    This results in the following terminal output.

    craig@Vera:~/C_Programs/Console/FileCompare/bin/Release$ ./FileCompare 
    max_tries=4, max_index=9
    d=0,k=0
    k value: 0
    x = 0, y=0
    d=1,k=-1
    k value: -1
    x = 0, y=1
    d=1,k=1
    k value: 1
    x = 1, y=0
    d=2,k=-2
    k value: -2
    x = 0, y=2
    d=2,k=0
    k value: 0
    x = 2, y=2
    Bailing
    Shortest edit path = 2
    Address of test: 0x563c29420280
    

    Not knowing the full context of your program, the boundary testing might need to be more sophisticated; however, boundary testing did need to be added.

    The take-away from this is probably to delve into some more tutorials in the creation, usage, and deallocation of arrays along with either the usage of debugging tools and/or the addition of temporary print statements to hone in on possible issues.