I am trying to implement the myers diff algorithm in C.
I am getting a segfault when trying to free the memory previously allocated to the path variable, used to store the x values for each iteration of d and k.
int ** MakeMoveSet(Comparison_t * comp) {
int **moveset, *path;
int64_t n = comp->file1LineCount, m = comp->file2LineCount;
int64_t max_tries = n + m;
int64_t max_index = 2 * max_tries + 1;
int64_t prev, next;
int x,y;
fprintf(stderr, "max_tries=%ju, max_index=%ju\n", max_tries, max_index);
path = malloc(sizeof(int) * max_index); //Array of ints;
moveset = malloc(sizeof(path) * max_tries); //Array of paths
path[1] = 0;
for (int d = 0; d < max_tries; d++) {
for (int k = -d; k <=d; k+=2) {
fprintf(stderr, "d=%d,k=%d\n", d, k);
prev = k - 1;
next = k + 1;
// Determine whether to move up or down
if (k == -d || (k != d && path[prev] < path[next])) {
x = path[next];
} else {
x = path[prev] + 1;
}
y = x - k;
// Check diagonals to settle x at deepest point
while ((x < n && y < m) && (strcmp(comp->file1Lines[x], comp->file2Lines[y]) == 0)) {
x ++;
y ++;
}
path[k] = x;
fprintf(stderr, "x = %d, y=%d\n", x, y);
if (x >= n && y >= m) {
free(path); //Segfault occurs here
fprintf(stderr, "Bailing\n");
comp->shortest_edit_path = d;
fprintf(stderr, "Shortest edit path = %d\n", d);
return moveset;
}
}
}
return NULL;
}
Why is this occuring when the memory allocated to path was done using malloc, and path is not null when the free call is made?
Not having the supporting code for the structure and code calling this function, I used some artistic license to define the file path comparison structure.
typedef struct cmp
{
int64_t file1LineCount;
char file1Lines[20][100];
int64_t file2LineCount;
char file2Lines[20][100];
int64_t shortest_edit_path;
} Comparison_t;
Then, built a very simple function call in the "main" function.
int main()
{
Comparison_t * work = malloc(sizeof(Comparison_t));
work->file1LineCount = 2;
strcpy(work->file1Lines[0], "C_Programs/Console/FileCompare/bin/Release");
work->file2LineCount = 2;
strcpy(work->file2Lines[0], "C_Programs/Console/FileCompare/bin/Test");
work->shortest_edit_path = 55;
int ** test = MakeMoveSet(work);
printf("Address of test: %p\n", test);
return 0;
}
And finally, added a simple "printf" statement that focuses in on the likely bug as noted in the good comments about possibly an assignment of data into an array that was out of bounds.
path[k] = x;
printf("k value: %d\n", k); /* Visual debug print statement */
fprintf(stderr, "x = %d, y=%d\n", x, y);
if (x >= n && y >= m)
{
free(path); //Segfault occurs here
fprintf(stderr, "Bailing\n");
comp->shortest_edit_path = d;
fprintf(stderr, "Shortest edit path = %d\n", d);
return moveset;
}
With that, executing a test of the code revealed that indeed some data assignments into the "path" array were occurring out of bounds (e.g. the value of "k" was a negative value), which subsequently causes undefined behavior.
craig@Vera:~/C_Programs/Console/FileCompare/bin/Release$ ./FileCompare
max_tries=4, max_index=9
d=0,k=0
k value: 0
x = 0, y=0
d=1,k=-1
k value: -1
x = 0, y=1
d=1,k=1
k value: 1
x = 1, y=0
d=2,k=-2
k value: -2
x = 0, y=2
d=2,k=0
k value: 0
x = 2, y=2
free(): invalid pointer
Aborted (core dumped)
The simple answer here is that further bounds testing needs to happen so that data assignment outside the boundaries of the assigned memory or array does not occur. As a simple bit of refactoring, the following boundary testing was added.
if (k >= 0 && k < max_tries) /* Boundary testing */
path[k] = x;
printf("k value: %d\n", k); /* Visual debug print statement */
fprintf(stderr, "x = %d, y=%d\n", x, y);
if (x >= n && y >= m)
{
free(path); //Segfault occurs here
fprintf(stderr, "Bailing\n");
comp->shortest_edit_path = d;
fprintf(stderr, "Shortest edit path = %d\n", d);
return moveset;
}
This results in the following terminal output.
craig@Vera:~/C_Programs/Console/FileCompare/bin/Release$ ./FileCompare
max_tries=4, max_index=9
d=0,k=0
k value: 0
x = 0, y=0
d=1,k=-1
k value: -1
x = 0, y=1
d=1,k=1
k value: 1
x = 1, y=0
d=2,k=-2
k value: -2
x = 0, y=2
d=2,k=0
k value: 0
x = 2, y=2
Bailing
Shortest edit path = 2
Address of test: 0x563c29420280
Not knowing the full context of your program, the boundary testing might need to be more sophisticated; however, boundary testing did need to be added.
The take-away from this is probably to delve into some more tutorials in the creation, usage, and deallocation of arrays along with either the usage of debugging tools and/or the addition of temporary print statements to hone in on possible issues.