Search code examples
cfilefile-ioline

reading lines of fixed size from a file in c


I want to process a file line by line in c, all lines in the file must be of length 100 characters if the line exeed this or the line is empty i want to print the number of the line in error and continu to the next line.

i'm using this but it doesn't work:

int maxLineLen = 101; // 100 + 1 for the '\n' end of line
char myBuffer[101];
FILE *myFile;

myFile = fopen("dataFile.txt", "r");

while (fgets(myBuffer, maxLineLen, myFile) != NULL) {
     // I can't figure out how to detect and print empty or error lines
}

Thank's fro the help.

Edit : I added this sample of my file :

                                                            // Empty line : Wrong line
FirstName-Paolo-LastName-Roberto-Age-23-Address-45,abcdefghijklmnopqrst-CustomerId-xxxxxxxxxxxxxxxx // Correct line
FirstName-Juliana-LastName-Mutti-Age-35-Address-28,abcdefghijklmnopqrst-CustomerId-xxxxxxxxxxxxxxxABCDEFGHIJx // Exeed the length : Wrong line
FirstName-David-LastName-Lazardi-Age-59-Address-101,abcdefghijklmnopqrst-CustomerId // Short length : Wrong line

When i run my program i should get :

Line 1 : ERROR
Line 3 : ERROR
Line 4 : ERROR

Solution

  • Since you need to detect both underlength and overlength lines reliably, and resynchronize you input after either, it is probably easiest to write a function that uses getc() to read the data.

    Your standard function options include:

    • fgets() — won't read too much data, but you'd have to determine whether it got a newline (which would be included in the input) and deal with resynchronization when reading an over-length line (not very difficult).
    • fread() — will read exactly the right length, and would be a good choice if you think overlength and underlength lines will be vanishingly rare occurrences. Resynchronization after an error is anything but trivial, especially if you get adjacent erroneous lines.
    • getline() — from POSIX 2008. Allocates sufficient memory for the length of line it reads, which is a little wasteful if you're simply going to discard over-length lines.

    Because they aren't suitable, you end up writing your own.

    Now tested code. (The fix was needed in the first if as diagnosed by Dave. The trouble was I originally wrote the inverse condition (if ((c = getc(fp)) != EOF && c != '\n')), then got distracted after I inverted the logic, leading to an 'incomplete inversion' of the condition.)

    The key parts of this are the two while loops.

    The first while loop reads to the end of the line, storing the data and counting characters — the normal operation. If the line is the right length, the loop will be broken when the newline is read. Note the <= condition; if you consider the loop when linelen == 1, you will see that <= is correct here even though < is more usual. If the line is short, the count will indicate that.

    The second while loop deals with overlong lines, reading to the end of the line and discarding the results. It uses x instead of c because c is needed in the return statement.

    /*
    @(#)File:           $RCSfile: rdfixlen.c,v $
    @(#)Version:        $Revision: 1.2 $
    @(#)Last changed:   $Date: 2012/04/01 00:15:43 $
    @(#)Purpose:        Read fixed-length line
    @(#)Author:         J Leffler
    */
    
    /* Inspired by https://stackoverflow.com/questions/9957006 */
    
    #include <stdio.h>
    #include <assert.h>
    
    extern int read_fixed_length_line(FILE *fp, char *buffer, int linelen);
    
    /* Read line of fixed length linelen characters followed by newline. */
    /* Buffer must have room for trailing NUL (newline is not included). */
    /* Returns length of line that was read (excluding newline), or EOF. */
    int read_fixed_length_line(FILE *fp, char *buffer, int linelen)
    {
        int count = 0;
        int c;
        assert(fp != 0 && buffer != 0 && linelen > 0);
        while (count < linelen)
        {
            if ((c = getc(fp)) == EOF || c == '\n')
                break;
            buffer[count++] = c;
        }
        buffer[count] = '\0';
        if (c != EOF && c != '\n')
        {
            /* Gobble overlength characters on line */
            int x;
            while ((x = getc(fp)) != EOF && x != '\n')
                count++;
        }
        return((c == EOF) ? EOF : count);
    }
    
    #ifdef TEST
    
    #include "posixver.h"
    #include <stdarg.h>
    #include <unistd.h>
    #include <string.h>
    
    int main(void)
    {
        enum { MAXLINELEN = 10 };
        int actlen;
        char line[16];
        int lineno = 0;
        memset(line, '\0', sizeof(line));
    
        while ((actlen = read_fixed_length_line(stdin, line, MAXLINELEN)) != EOF)
        {
            lineno++;
            if (actlen != MAXLINELEN)
            {
                if (actlen > MAXLINELEN)
                    printf("%2d:L: length %2d <<%s>>\n", lineno, actlen, line);
                else
                    printf("%2d:S: length %2d <<%s>>\n", lineno, actlen, line);
            }
            else
                printf("%2d:R: length %2d <<%s>>\n", lineno, actlen, line);
            assert(line[MAXLINELEN-0] == '\0');
            assert(line[MAXLINELEN+1] == '\0');
        }
        return 0;
    }
    
    #endif /* TEST */
    

    Test data and output

    $ cat xxx
    
    abcdefghij
    a
    Abcdefghij
    ab
    aBcdefghij
    abc
    abCdefghij
    abcd
    abcDefghij
    abcde
    abcdEfghij
    abcdef
    abcdeFghij
    abcdefg
    abcdefGhij
    abcdefgh
    abcdefgHij
    abcdefghi
    abcdefghIj
    abcdefghiJ
    abcdefghiJ1
    AbcdefghiJ
    abcdefghiJ12
    aBcdefghiJ
    abcdefghiJ123
    $ ./rdfixlen < xxx
     1:S: length  0 <<>>
     2:R: length 10 <<abcdefghij>>
     3:S: length  1 <<a>>
     4:R: length 10 <<Abcdefghij>>
     5:S: length  2 <<ab>>
     6:R: length 10 <<aBcdefghij>>
     7:S: length  3 <<abc>>
     8:R: length 10 <<abCdefghij>>
     9:S: length  4 <<abcd>>
    10:R: length 10 <<abcDefghij>>
    11:S: length  5 <<abcde>>
    12:R: length 10 <<abcdEfghij>>
    13:S: length  6 <<abcdef>>
    14:R: length 10 <<abcdeFghij>>
    15:S: length  7 <<abcdefg>>
    16:R: length 10 <<abcdefGhij>>
    17:S: length  8 <<abcdefgh>>
    18:R: length 10 <<abcdefgHij>>
    19:S: length  9 <<abcdefghi>>
    20:R: length 10 <<abcdefghIj>>
    21:R: length 10 <<abcdefghiJ>>
    22:L: length 11 <<abcdefghiJ>>
    23:R: length 10 <<AbcdefghiJ>>
    24:L: length 12 <<abcdefghiJ>>
    25:R: length 10 <<aBcdefghiJ>>
    26:L: length 13 <<abcdefghiJ>>
    $