Search code examples
csplitsegmentation-faultc-stringsfunction-definition

Segamentation fault encountered while parsing contents of string separated by a delimiter


During an interview, I was asked to implement a solution for following problem in C programming language:

Given an input string with a fixed pattern of contents separated by a delimiter, I need to extract each specific content.

Pattern of input string: "starting_message|integer_value_1|integer_value_2|character_code|ending_message"

Expected output:

Starting message: starting_message
Value 1 : integer_value_1
Value 2 : integer_value_2
Char code : character_code
Ending message : ending_message

Example input: "HelloWorld|35|45|C|ByeWorld"

Example output:

Starting message: HelloWorld
Value 1 : 35
Value 2 : 45
Char code : C
Ending message : ByeWorld

I implemented the following code:

#include<stdio.h>
#include<string.h>
#include<stdlib.h>
/*
Loop through the input string until termination.
Figure out the position of delimiters first. It will help in parsing later.
For string, count the number of chars and use memcpy for that number of elements.
To parse char to int, try atoi.
*/

void parse(const char* input, char* starting_message, int* value1, int* value2, char* char_code, char* ending_message)
{
  int ctr = 0, pos_ctr=0;
  int delim_pos[4];  /* To store the location of delimiters aka '|' */
  
  char* value1_str = (char*) malloc(10);
  char* value2_str = (char*) malloc(10);
  
  while(input[ctr]!= '\0')
  {
    if(input[ctr] == '|')
    {
      if(pos_ctr < 4)
      {
        delim_pos[pos_ctr] = ctr;
        pos_ctr++;
      }
    }
    ctr++;
  }
  
  memcpy(starting_message,input,(delim_pos[0]));  /* starting_message is contained in input string in between input[0] & input[delim_pos[0]]*/
  starting_message[delim_pos[0]+1] = '\0';
  
  memcpy(value1_str, input,(delim_pos[1]-delim_pos[0]));   /* value1 is contained in input string in between input[delim_pos[0]] & input[delim_pos[1]]*/
  value1_str[(delim_pos[1]-delim_pos[0] + 1)] = '\0';
  *value1 = atoi(value1_str);
  
  memcpy(value2_str, input,(delim_pos[2]-delim_pos[1]));   /* value2 is contained in input string in between input[delim_pos[1]] & input[delim_pos[2]]*/
  value1_str[(delim_pos[2]-delim_pos[1] + 1)] = '\0';
  *value2 = atoi(value2_str);
  
  *char_code = input[(delim_pos[2]+1)];     /* char_code is element next to input[delim_pos[2]]*/
  
  memcpy(ending_message, input, (delim_pos[3]-ctr-1));  /* ending_message is contained in input string in between input[delim_pos[3]] & null termination char*/
  ending_message[delim_pos[3]-ctr] = '\0';
  
}


int main() {

const char* input = "HelloWorld|35|45|C|ByeWorld";
char* starting_message = (char*) malloc(30);
char* ending_message = (char*) malloc(30);
int value1, value2;
char char_code;

parse(input, starting_message, &value1, &value2, &char_code, ending_message);


printf(" Input string: %s\n",input);
printf("Starting message : %s\n", starting_message);
printf("Value 1 : %d\n", value1);
printf("Value 2 : %d\n", value2);
printf("Character code : %c\n", char_code);
printf("Ending message: %s\n", ending_message);

    return 0;
}

I was able to compile but on the output screen, I encountered a segmentation fault with no output.

Output screen

Where did I go wrong and how to fix it?


Solution

  • For starters the function does not report whether parsing of the input string was successful.

    The function has too many parameters. Instead of this list of parameters

    char* starting_message, int* value1, int* value2, char* char_code, char* ending_message
    

    you could pass an object of a structure type. For example the function declaration could be

    int parse( const char *input, struct Fields *fields );
    

    These memory allocations

    char* value1_str = (char*) malloc(10);
    char* value2_str = (char*) malloc(10)
    

    are redundant and moreover due to these allocations the function produces memory leaks.

    Instead of the function atoi you should use the function strtol.

    Instead of this while loop

    while(input[ctr]!= '\0')
    

    you could use a loop within which there is called the function strchr.

    In this assignment

    starting_message[delim_pos[0]+1] = '\0';
    

    the character at the position delim_pos[0] can have indeterminate value. Instead of the statement above you have to write

    starting_message[delim_pos[0]] = '\0';
    

    In these statements

    memcpy(value1_str, input,(delim_pos[1]-delim_pos[0]));
    memcpy(value2_str, input,(delim_pos[2]-delim_pos[1]));
    memcpy(ending_message, input, (delim_pos[3]-ctr-1)); 
    

    you are copying characters beginning from the start of the string input that does not make a sense. Moreover this expression delim_pos[3]-ctr-1 will yield a negative value.

    This statement

    ending_message[delim_pos[3]-ctr] = '\0';
    

    invokes undefined behavior because the value of the expression delim_pos[3]-ctr is negative. ctr stores the length of the input string.

    Thus the function has undefined behavior and does not make a great sense.:)