Search code examples
cprintinglineslseek

Print last 10 lines of file or stdin with read write and lseek


I'm working on an implementation of the tail function and I'm only supposed to use read(), write() and lseek() for I/O, and so far I have this:

int printFileLines(int fileDesc)
{
    char c; 
    int lineCount = 0, charCount = 0;   
    int pos = 0, rState;
    while(pos != -1 && lineCount < 10)
    {
        if((rState = read(fileDesc, &c, 1)) < 0)
        {
            perror("read:");
        }
        else if(rState == 0) break;
        else
        {
            if(pos == -1)
            {
                pos = lseek(fileDesc, 0, SEEK_END);
            }
            pos--;
            pos=lseek(fileDesc, pos, SEEK_SET); 
            if (c == '\n')
            {
                lineCount++;
            }
            charCount++;
        }
    }

    if (lineCount >= 10)
        lseek(fileDesc, 2, SEEK_CUR);
    else
        lseek(fileDesc, 0, SEEK_SET);

    char *lines = malloc(charCount - 1 * sizeof(char));

    read(fileDesc, lines, charCount);
    lines[charCount - 1] = 10;
    write(STDOUT_FILENO, lines, charCount);

    return 0;
}

So far it works for files that have more than 10 lines, but it brakes when i pass a file with less than 10 lines, it just prints the last line of that file, and I can't get it to work with stdin. If someone can give me an idea how to fix this issues, that'd be great :D


Solution

  • The first issue:

    If you read a newline here ...

    if(read(fileDesc, &c, 1) < 0)
    {
        perror("read:");
    }
    

    ... and then set the position directly to the character preceding that newline ...

    pos--;
    pos=lseek(fileDesc, pos, SEEK_SET);
    

    and then the linecount is >= 10 (the while-loop terminates), then the first char you read is the last char of the line preceding the last newline. The newline itself also isn't part of the last 10 lines, so just skip two chars from the current stream position on:

    if (linecount >= 10)
        lseek(fileDesc, 2, SEEK_CUR);
    

    For the second issue:

    Lets assume, that the stream offset has reached the beginning of the stream:

    pos--;
    pos=lseek(fileDesc, pos, SEEK_SET); // pos is now 0
    

    The while-condition is still TRUE:

    while(pos != -1 && lineCount < 10)
    

    Now a char is read. After this, the file offset is 1 (the second character):

    if(read(fileDesc, &c, 1) < 0)
    {
        perror("read:");
    }
    

    Here, pos drops to -1 and lseek will fail:

    pos--;
    pos=lseek(fileDesc, pos, SEEK_SET); 
    

    Since lseek has failed, the position in the file is now the second character, hence the first character is missing. Fix this by resetting the file offset to the beginning of the file if pos == -1 after the while-loop:

    if (linecount >= 10)
        lseek(fileDesc, 2, SEEK_CUR);
    else
        lseek(fileDesc, 0, SEEK_SET);
    

    Performance:

    This needs very many system-calls. An easy enhancement would be to use the buffered f*-functions:

    FILE *f = fdopen(fileDesc, "r");
    fseek(...);
    fgetc(...);
    

    etc. Additionally, this doesn't need system-specific functions.

    Even better would be to read the file backwards chunk by chunk and operate on these chunks, but this needs some more coding effort.

    For Unix, you could also mmap() the whole file and search backwards in memory for newline characters.