Search code examples
cduplicatescharc-stringsfunction-definition

Remove Adjacent Duplicates in String if the occurrence happens for k times--- heap buffer overflow error


Given a string s and an integer k, a k duplicate removal consists of choosing k adjacent and equal letters from s and removing them, causing the left and the right side of the deleted substring to concatenate together. We repeatedly make k duplicate removals on s until we no longer can. I tried the following code and it works in one machine(Machine 1) and throws an error in another machine(Machine 2).

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int sizeOfString(char * s)
{
    int count=0;
    while(s[count]!='\0')
        count ++;
    
    return count;
}

int getSubString(char *source, char *target,int from, int to)
{
    int i=0,j=0;
    for(i=from,j=0;i<=to;i++,j++){
        target[j]=source[i];
    }
    target[j]='\0'; 
    return 0;   
}

char * removeDuplicates(char * s, int k){
        int sizeString=sizeOfString(s);
        char *stringS= (char*)calloc(sizeString , sizeof(char));
        strcpy( stringS, s);
        int occuranceCount=1;
        int i=1;
        for (i=1;i<sizeString;i++)
        {
          if (s[i]==s[i-1])
          {
              occuranceCount++;
          }
          else{
              occuranceCount=1;
          }
          if (occuranceCount==k)
          {
              char *firstPart=(char*)calloc(i-k ,sizeof(char));
              getSubString(s,firstPart,0,i-k);
              char *secondPart=(char*)calloc(sizeString ,sizeof(char));
              getSubString(s,secondPart,i+1,sizeString-1);
              strcat(firstPart,secondPart);
              strcpy(s,firstPart);
              free (firstPart);
              free(secondPart);
              removeDuplicates(s,k);
          }
            
        }
    free(stringS);
    return s;
            
}

int main()
{
    char word[]="adbbbcccb";
    printf("after truncating the string is %s\n",removeDuplicates(word,3));
    return 0;
}

Machine 1 output:

after truncating the string is adb


...Program finished with exit code 0
Press ENTER to exit console.

Machine 2 throws a heap buffer overflow error at a particular value of PC, BP, and SPthe error is shown in the attached picture . What I am doing wrong?


Solution

  • For starters it is unclear why you wrote your own function sizeOfString instead of using the standard string function strlen when on the other hand you are using other standard string functions as for example strcpy or strcat.

    Either you should use everywhere where it is required standard string functions or you should not use standard string function altogether.

    The used approach with dynamically allocating memory is inefficient and unsafe.

    For example this code snippet

        int sizeString=sizeOfString(s);
        char *stringS= (char*)calloc(sizeString , sizeof(char));
        strcpy( stringS, s);
    

    results in memory overflow because there is not allocated a space for the terminating zero of the source string.

    The function has bugs. For example let's assume that k is equal to 2 and the string starts with two equal characters. In this case you have (see this code snippet)

       int occuranceCount=1;
        int i=1;
        for (i=1;i<sizeString;i++)
        {
          if (s[i]==s[i-1])
          {
              occuranceCount++;
          }
          else{
              occuranceCount=1;
          }
          if (occuranceCount==k)
          {
              char *firstPart=(char*)calloc(i-k ,sizeof(char));
              //...
    

    i is equal to 1, occuranceCount was incremented and become equal to 2 because the second character of the string is equal to the first character of the string. Thus the expression i - k in the call of calloc is equal to -1 that results in invalid size of the allocated memory that can lead to undefined behavior.

    Or for example after this statement

    strcpy(s,firstPart);
    

    and the recursive call of the function itself

    removeDuplicates(s,k);
    

    the length of the string s and the used index i become invalid. So the for loop in turn will be invalid.

    I can suggest the following non-recursive function shown in the demonstrative program below. The function is not based on using standard C string functions and does not allocate dynamically memory.

    #include <stdio.h>
    
    char * removeDuplicates( char *s, size_t n )
    {
        if ( n == 1 )
        {
            *s = '\0';
        }
        else if ( n != 0 )
        {
            char *p = s;
            for ( char *q = s; *q; )
            {
                char c = *q;
                size_t i = 1;
                while ( i < n && *++q == c ) ++i;
                
                if ( i != n )
                {
                    if ( p != q - i )
                    {
                        while ( i ) *p++ = *( q - i-- );
                    }
                    else
                    {
                        p += i;
                    }
                }
                else
                {
                    ++q;
                }
            }
            
            *p = '\0';
        }
        
        return s;
    }
    
    int main(void) 
    {
        char s0[] = "112223333";
        
        puts( removeDuplicates( s0, 0 ) );
        
        char s1[] = "112223333";
        
        puts( removeDuplicates( s1, 1 ) );
        
        char s2[] = "112223333";
        
        puts( removeDuplicates( s2, 2 ) );
        
        char s3[] = "adbbbcccb";
        
        puts( removeDuplicates( s3, 3 ) );
        
        return 0;
    }
    

    The program output is

    112223333
    
    2
    adb