Search code examples
filebinaryfilesbinary-data

What does a line count of a binary file mean?


:~$ wc -l bitmap.bmp
12931 bitmap.bmp

I would guess a binary file is like a stream, with no lines on it. So what does it mean when you talk about lines in a binary file?

(note: "wc -l" counts the lines in a file)

Alex Taylor pointed out below, as I suspected, that wc is counting the number of /n chars in the file.

So the question becomes: The '\n' characters that wc finds are there randomly when it translates binary to text or do actually exist in the binary file? As something as b'\n' (in Python)? And if yes, why would someone use the newline char in a binary file?


Solution

  • It's the number of new line characters ('\n') in the data.

    Looking at the source code for MacOS' wc, we see the following code:

    if (doline) {
        while ((len = read(fd, buf, buf_size))) {
            if (len == -1) {
                warn("%s: read", file);
                (void)close(fd);
                return (1);
            }
            charct += len;
            for (p = buf; len--; ++p)
                if (*p == '\n')
                    ++linect;
        }
    

    It does a buffered read of the file, then loops through the data, incrementing a counter if it finds a '\n'.

    The GNU version of wc contains similar code:

    /* Increase character and, if necessary, line counters */
    #define COUNT(c)       \
          ccount++;        \
          if ((c) == '\n') \
            lcount++;
    

    As to why a binary file has new line characters in it, they are just another value (0x0A for the most common OS'). There is nothing special about the character unless the file is being interpreted as a text file. Likewise, tabs, numbers and all the other 'text' characters will also appear in a binary file. This is why using cat on a binary file can cause a terminal to beep wildly - it's trying to display the BEL character (0x07). Text is only text by convention.