Search code examples
cgetlinegarbage

getline() reading garbage value while reading from a text file


In the following code, I want to read the text file with the first line as, <p> -> <many_declaration> <many_expression>. The code snippet is as follows:

    ssize_t read;
size_t len = 200;
FILE *fptr;
fptr = fopen(fn ,"r");


if(fptr==NULL){
    printf("Error\n");
    fclose(fptr);
}

else{
    int line_number=1;

    char * line = (char *)malloc(sizeof(char)*200);
    while((read = getline(&line, &len, fptr)) != -1)
    {
        char * tokens = strtok(line, " \t\n");
        
        
        while( tokens != NULL ) 
        {
            
            printf("%s \n",tokens);
            printf("%zu \n",strlen(tokens));
            
            tokens = strtok(NULL, " \t\n");
            
        }
        

    }
}

However, while reading it using getline() in and then splitting the string using strtok(), the first token should be <p> of size 3 but the string being read is some invisible characters followed by <p> and the length is 6. Could you please tell me what the problem is? Thank you!


Solution

  • The file probably starts with an UTF8 BOM. You should read the file as UTF-8 if so.