In a bash file, I have logfileA.txt
that contains output from wget
that I'd like to run grep
on to check for any instances of the words "error" or "fail", etc, as so:
grep -ni --color=never -e "error" -e "fail" logfileA.txt | awk -F: '{print "Line "$1": "$2}'
# grep -n line number, -i ignore case; awk to add better format to the line numbers (https://stackoverflow.com/questions/3968103)
Trouble is though, I think the wget
output in logfileA.txt
is full of characters that may be messing up the input for grep
, as I'm not getting reliable matches.
Troubleshooting this, I cannot even cat
the contents of the log file reliably. For instance, with cat logfileA.txt
, all I get is the last line which is garbled:
FINISHED --2019-05-29 17:08:52--me@here:/home/n$ 71913592/3871913592]atmed out). Retrying.
The contents of logfileA.txt
is:
--2019-05-29 15:26:50-- http://somesite.com/somepath/a0_FooBar/BarFile.dat
Reusing existing connection to somesite.com:80.
HTTP request sent, awaiting response... 302 Found
Location: http://cdn.somesite.com/storage/a0_FooBar/BarFile.dat [following]
--2019-05-29 15:26:50-- http://cdn.somesite.com/storage/a0_FooBar/BarFile.dat
Resolving cdn.somesite.com (cdn.somesite.com)... xxx.xxx.xx.xx
Connecting to cdn.somesite.com (cdn.somesite.com)|xxx.xxx.xx.xx|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3871913592 (3.6G) [application/octet-stream]
Saving to: 'a0_FooBar/BarFile.dat’
a0_FooBar/BarFile.dat 0%[ ] 0 --.-KB/s
a0_FooBar/BarFile.dat 0%[ ] 15.47K 70.5KB/s
...
a0_FooBar/BarFile.dat 49%[========> ] 1.80G --.-KB/s in 50m 32s
2019-05-29 16:17:23 (622 KB/s) - Read error at byte 1931163840/3871913592 (Connection timed out). Retrying.
--2019-05-29 16:17:24-- (try: 2) http://cdn.somesite.com/storage/a0_FooBar/BarFile.dat
Connecting to cdn.somesite.com (cdn.somesite.com)|xxx.xxx.xx.xx|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 3871913592 (3.6G), 1940749752 (1.8G) remaining [application/octet-stream]
Saving to: 'a0_FooBar/BarFile.dat’
a0_FooBar/BarFile.dat 49%[+++++++++ ] 1.80G --.-KB/s
...
a0_FooBar/BarFile.dat 100%[+++++++++==========>] 3.61G 1.09MB/s in 34m 44s
2019-05-29 16:52:09 (909 KB/s) - 'a0_FooBar/BarFile.dat’ saved [3871913592/3871913592]
FINISHED --2019-05-29 17:08:52--
I assume the problem could be the /
s or ---
s or >
s or ==>
s or |
s?
But since the output from wget
could vary, how do I anticipate and escape anything problematical for grep
?
grep -ni --color=never -e "error" -e "fail" logfileA.txt | awk -F: '{print "Line "$1": "$2}'
Line 17: 2019-05-29 16:17:23 (622 KB/s) - Read error at byte 1931163840/3871913592 (Connection timed out). Retrying.
Also, would an ack
line be better at this job? And if so, what/how?
Wrt I assume the problem could be the /s or ---s or >s or ==>s or |s?
- no, there's nothing special about any of those characters/strings. It sounds like you might have DOS line endings (\r\n
), see Why does my tool output overwrite itself and how do I fix it?. Since you said with cat logfileA.txt, all I get is the last line which is garbled
I wonder if you ONLY have \r
s and no \n
s as line endings. If you do then tr '\r' '\n' < logfileA.txt > tmp && mv tmp logfileA.txt
would fix that. If that IS the issue then going forward you can use awk -v RS='\r' 'script'
to change the record separator from it's default \n
to \r
and then you won't need to do that tr
step.
You don't need grep when you're using awk though. This:
grep -ni --color=never -e "error" -e "fail" logfileA.txt |
awk -F: '{print "Line "$1": "$2}'
can be written as just:
awk 'tolower($0) ~ /error|fail/{print "Line "NR":"$0}' logfileA.txt
but the awk-only version is more robust as it'll correctly display full lines that contain :
s where the grep+awk version will truncate them to the first :
.
You can handle the DOS line endings, if any, by tweaking the script to:
awk 'tolower($0) ~ /error|fail/{sub(/\r$/,""); print "Line "NR":"$0}' logfileA.txt
and you can make it look for error or fail as standalone words (as opposed to part of other strings like terror
or failles
) by doing this with GNU awk:
awk -v IGNORECASE=1 -v RS='\r?\n' '/\<(error|fail)\>/{print "Line "NR":"$0}' logfileA.txt
or this with any awk:
awk 'tolower($0) ~ /(^|[^[:alnum:]_])(error|fail)([^[:alnum:]_]|$)/{sub(/\r$/,""); print "Line "NR":"$0}' logfileA.txt