Search code examples
cparsingiofgetsstrtok

How to properly get a line and parse it with C


I'm writing a C program that will open a file, write to it, and then read what was written. I can open, write, and close the file, but I can't read the lines and parse them correctly.

I have read many other blogs and sites, but none fully address what I'm trying to do. I've tried adapting their general solutions, but I never get the behavior I want. I have run this code with fgets(), gets(), strtok(), and scanf(), and fscanf(). I used strtok_r() as it was recommended as best practice. I used gets() and scanf() as experiments to see what their output would be, as opposed to fgets() and fscanf().

What I want to do:

  1. get first line // fist line is a string of space delimited ints "1 2 3 4 5"
  2. parse this line, convert each char number into a integer
  3. store this into an array.
  4. get the next line and repeat until EOF

Can someone please tell me what I'm missing and what functions would be considered best practice?

Thanks

My code:

#include <stdio.h> 
#include <pthread.h> 
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(){
  FILE * file;

  // read data from customer.txt
  char lines[30];
  file = fopen("data.txt", "r"); 
  // data.txt currently holds five lines
  // 1 1 1 1 1 
  // 2 2 2 2 2
  // 3 3 3 3 3
  // 4 4 4 4 4 
  // 5 5 5 5 5

  char *number;
  char *next = lines;


  int s = 0;
  int t = 0;
  int num;
  int prams[30][30];

  while(fgets(lines, 30, file)){
        char *from = next;

    while((number = strtok_r(from, " ", &next)) != NULL){
        int i = atoi(number);
        prams[t][s] = i;
        printf("this is prams[%d][%d]: %d\n", t, s, prams[t][s]);

        s++;
        from = NULL;               
    }

    t++;
  }

  fclose(file);
}// main

expected output:

this is prams[0][0]: 1
...
this is prams[4][4]: 5

Actual output:

this is prams[0][0]: 1
this is prams[0][1]: 1
this is prams[0][2]: 1
this is prams[0][3]: 1
this is prams[0][4]: 1
program ends


Solution

  • The main problems are :

    • you never reset s to 0, so the column always increase rather than to be from 0 to 4 (if 5 numbers per line), so you do not write on the expected entries in the array from the second line and you have a risk to write out of the array with an undefined behavior (like a segmentation fault)
    • check you do not read too much columns and lines (30 in your code), else you can write out of the array with an undefined behavior (like a segmentation fault)
    • you use wrongly strtok_r, the first parameter must be not null only the first time you parse a line (before your edit)
    • doing number = strtok_r(from, " ", &next) next is modified by strtok_r while it is used to initialize from for the next line, so the second line will not be read correctly and your execution is only :

    this is prams[0][0]: 11
    this is prams[0][1]: 12
    this is prams[0][2]: 13
    this is prams[0][3]: 14
    this is prams[0][4]: 15
    this is prams[3][5]: 0

    with data.txt containing :

    11 12 13 14 15
    21 22 23 24 25
    31 32 33 34 35
    41 42 43 44 45
    51 52 53 54 55

    (also look at the indexes [3][5] because you missed to reset s )

    Additional remarks :

    • check fopen success
    • initialize prams or memorize how much columns there are on the first line and check it is always the same number of column on the next lines, also memorize how much lines of course, else you don't know later where are the read numbers in the array
    • atoi does not indicates if you read a number or not

    A proposal to take these remarks into account is (I initialize the array with 0 without making assumption on the number of numbers per line) :

    #include <stdio.h>
    #include <string.h>
    
    #define LINELENGTH 30
    #define SIZE 30
    
    int main(){
      // read data from customer.txt
      char lines[LINELENGTH];
      FILE * file = fopen("data.txt", "r"); 
    
      if (file == NULL) {
        fprintf(stderr, "cannot read data.txt");
        return -1;
      }
    
      // data.txt currently holds five lines
      // 1 1 1 1 1 
      // 2 2 2 2 2
      // 3 3 3 3 3
      // 4 4 4 4 4 
      // 5 5 5 5 5
    
      int t = 0;
      int prams[SIZE][SIZE] = { 0 };
    
      while (fgets(lines, LINELENGTH, file)) {
        char * number;
        char * str = lines;
        int s = 0;
    
        while ((number = strtok(str, " \n")) != NULL) {
          char c;
          int i;
    
          if (sscanf(number, "%d%c", &i, &c) != 1) {
            fprintf(stderr, "invalid number '%s'\n", number);
            return -1;
          }
          prams[t][s] = i;
          printf("this is prams[%d][%d]: %d\n", t, s, prams[t][s]);
          str = NULL;
          if (++s == SIZE)
            break;
        }
    
        if (++t == SIZE)
          break;
      }
    
      fclose(file);
    }// main
    

    I use sscanf(number, "%d%c", &i, &c) != 1 to easily detect if a number and only a number is read or not, note I added \n is the separators for strtok

    Compilation and execution :

    pi@raspberrypi:/tmp $ !g
    gcc -pedantic -Wall -Wextra l.c
    pi@raspberrypi:/tmp $ cat data.txt 
    11 12 13 14 15
    21 22 23 24 25
    31 32 33 34 35
    41 42 43 44 45 
    51 52 53 54 55
    pi@raspberrypi:/tmp $ ./a.out
    this is prams[0][0]: 11
    this is prams[0][1]: 12
    this is prams[0][2]: 13
    this is prams[0][3]: 14
    this is prams[0][4]: 15
    this is prams[1][0]: 21
    this is prams[1][1]: 22
    this is prams[1][2]: 23
    this is prams[1][3]: 24
    this is prams[1][4]: 25
    this is prams[2][0]: 31
    this is prams[2][1]: 32
    this is prams[2][2]: 33
    this is prams[2][3]: 34
    this is prams[2][4]: 35
    this is prams[3][0]: 41
    this is prams[3][1]: 42
    this is prams[3][2]: 43
    this is prams[3][3]: 44
    this is prams[3][4]: 45
    this is prams[4][0]: 51
    this is prams[4][1]: 52
    this is prams[4][2]: 53
    this is prams[4][3]: 54
    this is prams[4][4]: 55