Search code examples
shelltext-filesposix

Shell: how to determine that no lines in a text file exceed 2048 bytes in length


I realize it's very unlikely that the size of a single line in a text file would ever organically exceed 2048 bytes in size. But I still think it would be very valuable to know how to make sure it wasn't the case

Edit: Just wanted to say that the reason I asked this question is because I'm writing a script that verifies that a file is a text file as defined by POSIX. One of the requirements is that no line in a text file shall exceed {LINE_MAX} bytes in length (newline inclusive). On Ubuntu and FreeBSD this value is 2048.

On GNU Linux you need not worry about this limitation, as it will allow a line length that is bound only by memory. FreeBSD, however, does impose this limitation, and I've recently made a serious effort to learn FreeBSD, so I think it's an important thing for me to able to do.

Edit: I think I was wrong about FreeBSD. I'm able to process lines that exceed 2048 bytes in length with grep


Solution

  • This will literally find the number of bytes:

    LANG=C grep -E '^.{2049}' some.txt
    

    For example:

    $ printf é | LANG=C grep -E '^.{2}'
    é
    

    If you instead mean characters, use the relevant LANG value or don't set it to rely on your shell default:

    $ printf é | LANG=en_US.utf8 grep -E '^.{2}'
    $ echo $?
    1
    

    If you mean graphemes, use this:

    printf 🐚 | grep -Px '\X{2}'
    $ echo $?
    1