Search code examples
cfileparsingtext-parsing

parsing a file while reading in c


I am trying to read each line of a file and store binary values into appropriate variables. I can see that there are many many other examples of people doing similar things and I have spent two days testing out different approaches that I found but still having difficulties getting my version to work as needed.

I have a txt file with the following format:

in = 00000000000, out = 0000000000000000
in = 00000000001, out = 0000000000001111
in = 00000000010, out = 0000000000110011
......

I'm attempting to use fscanf to consume the unwanted characters "in = ", "," and "out = " and keep only the characters that represent binary values.

My goal is to store the first column of binary values, the "in" values into one variable and the second column of binary values, the "out" value into another buffer variable.

I have managed to get fscanf to consume the "in" and "out" characters but I have not been able to figure out how to get it to consume the "," "=" characters. Additionally, I thought that fscanf should consume the white space but it doesn't appear to be doing that either.

I can't seem to find any comprehensive list of available directives for scanners, other than the generic "%d, %s, %c....." and it seems that I need a more complex combination of directives to filter out the characters that I'm trying to ignore than I know how to format.

I could use some help with figuring this out. I would appreciate any guidance you could provide to help me understand how to properly filter out "in = " and ", out = " and how to store the two columns of binary characters into two separate variables.

Here is the code I am working with at the moment. I have tried other iterations of this code using fgetc() in combination with fscanf() without success.

int main()
{
    FILE * f = fopen("hamming_demo.txt","r");
    char buffer[100];
    rewind(f);
    while((fscanf(f, "%s", buffer)) != EOF) {
        fscanf(f,"%[^a-z]""[^,]", buffer);
        printf("%s\n", buffer);
    }
    printf("\n");
    return 0;
}

The outputs from my code appear as follows:

 = 00000000000, 
 = 0000000000000000

 = 00000000001, 
 = 0000000000001111

 = 00000000010, 
 = 0000000000110011

Thank you for your time.


Solution

  • The scanf family function is said to be a poor man'parser because it is not very tolerant to input errors. But if you are sure of the format of the input data it allows for simple code. The only magic here if that a space in the format string will gather all blank characters including new lines or none. Your code could become:

    int main()
    {
        FILE * f = fopen("hamming_demo.txt", "r");
        if (NULL == f) {                               // always test open
            perror("Unable to open input file");
            return 1;
        }
        char in[50], out[50];                          // directly get in and out
        // BEWARE: xscanf returns the number of converted elements and never EOF
        while (fscanf(f, " in = %[01], out = %[01]", in, out) == 2) {
            printf("%s - %s\n", in, out);
        }
        printf("\n");
        return 0;
    }