Search code examples
linuxbashunixawklocale

Fix Mismatch Between Data And Local In Awk Command


I am receiving the following error:

awk: cmd. line:1: (FILENAME=- FNR=798) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.

The command I'm running is the following:

cat file.txt | awk 'length($0)<10000' > output-file.txt

The weird part is that if I pipe to other commands like awk '{ sub("\r$", ""); print }', it works just fine without an error.

Anyone see why I would get this error? Or, should I just ignore it?


Solution

  • Make the locale as C to use only ASCII character set with single byte encoding, pass LC_ALL=C to awk's environment:

    LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt
    

    Also you don't need to use cat as awk takes filename(s) as argument(s).