Search code examples
linuxbashawkprocess-substitution

Passing result of tr as second parameter in awk


My command:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 file2

The problem is that file 2 contains \000 characters and awk consider it as binary file.

Replacing \000 with space character:

tr '\000' ' ' < file2 > file2_not_binary

solves binary file problem.

However my file2 is a 20GB file. And I don't want to do tr separately and save result as another file. I want to pass the result of tr to awk.

I have tried:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 < (tr '\000' ' ' < file2)

But the result is:

The system cannot find the file specified. 

Another question is: can my memory or awk handle such a big file at once? I'm working on 12GB RAM PC.

EDIT

One of the answer works as I expected (credits to Ed Morton)

tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -

However it is like 2 time slower then doing the same in 2 steps - first removing \000 and save it and then using awk to search. How I can speed it up?

EDIT2

My bad. Ed Morton solution is actually a little bit faster then doing the same in two separately commands.

Two commands separately: 08:37:053

Two commands piped: 08:07:204


Solution

  • Since awk isn't storing your 2nd file in memory the size of that file is irrelevant except for speed of execution. Try this:

    tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -