Search code examples
stringmallocreturn-valuec99memory-corruption

C Function to return a String resulting in corrupted top size


I am trying to write a program that calls upon an [external library (?)] (I'm not sure that I'm using the right terminology here) that I am also writing to clean up a provided string. For example, if my main.c program were to be provided with a string such as:

asdfFAweWFwseFL Wefawf JAWEFfja FAWSEF

it would call upon a function in externalLibrary.c (lets call it externalLibrary_Clean for now) that would take in the string, and return all characters in upper case without spaces:

ASDFFAWEWFWSEFLWEFAWFJAWEFFJAFAWSEF

The crazy part is that I have this working... so long as my string doesn't exceed 26 characters in length. As soon as I add a 27th character, I end up with an error that says malloc(): corrupted top size.

Here is externalLibrary.c:

#include "externalLibrary.h"
#include <ctype.h>
#include <malloc.h>
#include <assert.h>
#include <string.h>

char * restrict externalLibrary_Clean(const char* restrict input) {
    // first we define the return value as a pointer and initialize
    // an integer to count the length of the string
    char * returnVal = malloc(sizeof(input));
    char * initialReturnVal = returnVal; //point to the start location

    // until we hit the end of the string, we use this while loop to
    // iterate through it
    while (*input != '\0') {
        if (isalpha(*input)) {  // if we encounter an alphabet character (a-z/A-Z)
                                // then we convert it to an uppercase value and point our return value at it
            *returnVal = toupper(*input);
            returnVal++; //we use this to move our return value to the next location in memory
            
        }
        input++; // we move to the next memory location on the provided character pointer
    }

    *returnVal = '\0'; //once we have exhausted the input character pointer, we terminate our return value

    return initialReturnVal;
}

int * restrict externalLibrary_getFrequencies(char * ar, int length){
    static int freq[26];
    for (int i = 0; i < length; i++){
        freq[(ar[i]-65)]++;
    }
    return freq;
}

the header file for it (externalLibrary.h):

#ifndef LEARNINGC_EXTERNALLIBRARY_H
#define LEARNINGC_EXTERNALLIBRARY_H

#ifdef __cplusplus
extern "C" {
#endif

char * restrict externalLibrary_Clean(const char* restrict input);
int * restrict externalLibrary_getFrequencies(char * ar, int length);

#ifdef __cplusplus
}
#endif

#endif //LEARNINGC_EXTERNALLIBRARY_H

my main.c file from where all the action is happening:

#include <stdio.h>
#include "externalLibrary.h"

int main() {
    char * unfilteredString = "ASDFOIWEGOASDGLKASJGISUAAAA";//if this exceeds 26 characters, the program breaks 
    char * cleanString = externalLibrary_Clean(unfilteredString);
    //int * charDist = externalLibrary_getFrequencies(cleanString, 25); //this works just fine... for now

    printf("\nOutput: %s\n", unfilteredString);
    printf("\nCleaned Output: %s\n", cleanString);
    /*for(int i = 0; i < 26; i++){
        if(charDist[i] == 0){

        }
        else {
            printf("%c: %d \n", (i + 65), charDist[i]);
        }
    }*/

    return 0;
}

I'm extremely well versed in Java programming and I'm trying to translate my knowledge over to C as I wish to learn how my computer works in more detail (and have finer control over things such as memory).

If I were solving this problem in Java, it would be as simple as creating two class files: one called main.java and one called externalLibrary.java, where I would have static String Clean(string input) and then call upon it in main.java with String cleanString = externalLibrary.Clean(unfilteredString).

Clearly this isn't how C works, but I want to learn how (and why my code is crashing with corrupted top size)


Solution

  • The bug is this line:

    char * returnVal = malloc(sizeof(input));
    

    The reason it is a bug is that it requests an allocation large enough space to store a pointer, meaning 8 bytes in a 64-bit program. What you want to do is to allocate enough space to store the modified string, which you can do with the following line:

    char *returnVal = malloc(strlen(input) + 1);
    

    So the other part of your question is why the program doesn't crash when your string is less than 26 characters. The reason is that malloc is allowed to give the caller slightly more than the caller requested.

    In your case, the message "malloc(): corrupted top size" suggests that you are using libc malloc, which is the default on Linux. That variant of malloc, in a 64-bit process, would always give you at least 0x18 (24) bytes (minimum chunk size 0x20 - 8 bytes for the size/status). In the specific case that the allocation immediately precedes the "top" allocation, writing past the end of the allocation will clobber the "top" size.

    If your string is larger than 23 (0x17) you will start to clobber the size/status of the subsequent allocation because you also need 1 byte to store the trailing NULL. However, any string 23 characters or shorter will not cause a problem.

    As to why you didn't get an error with a string with 26 characters, to answer that one would have to see that exact program with the string of 26 characters that does not crash to give a more precise answer. For example, if the program provided a 26-character input that contained 3 blanks, this would would require only 26 + 1 - 3 = 24 bytes in the allocation, which would fit.

    If you are not interested in that level of detail, fixing the malloc call to request the proper amount will fix your crash.