Hello I am trying to find a pattern match on some HTML files using AWK but i dont seem to have any luck with it
So for my pattern to match it should have the following
<tr>
<td>Failures</td>
<td>0</td>
</tr>
<tr>
<td>Warnings</td>
<td>4</td>
</tr>
<tr>
<td>Errors</td>
<td>0</td>
</tr>
<tr>
<td>Not Applicable</td>
<td>53</td>
</tr>
<tr>
<td>Manual Checks</td>
<td>9</td>
</tr>
Failures and Manual Checks should be zero. So in the above file failures is 0 and manual check is 9. So i need to match only when failure is 0 and manual check is 0.
SO i tried with and without escaping the new line but awk is not returning any results.
find . -name "*.html" -exec awk '/td\>Failures\<\/td\>\\n.*\<td\>0/ {print FILENAME}' '{}' \;
I have also tried other combinations like below but cant seem to figure out why awk is not going to the next line.
find . -name "*.html" -exec awk '/td\>Failures\<\/td\>\\n\[\^\\\<\]\+\<td\>0/ {print FILENAME}' '{}' \;
Can anyone please have a look and tell me what i am missing?
A more reliable solution is going to be based on a tool designed to parse html
; having said that ...
One awk
idea using a couple custom regex patterns:
$ cat regex.awk
BEGIN { RS="^$" # whole file treated as a single line of input
regex1="<td>Manual Checks</td>[[:space:]]+<td>0</td>"
regex2="<td>Failures</td>[[:space:]]+<td>0</td>"
}
$0 ~ regex1 && $0 ~ regex2 {print FILENAME}
NOTE: placing the code in a file (regex.awk
) will make the follow-on find/awk
quite a bit cleaner
Sample input:
$ cat f1.html
... snip ...
<td>Failures</td>
<td>0</td> # match
... snip ...
<td>Manual Checks</td>
<td>9</td> # not a match
... snip ...
$ cat f2.html
... snip ...
<td>Failures</td>
<td>0</td> # match
... snip ...
<td>Manual Checks</td>
<td>0</td> # match
... snip ...
NOTE: comments added for clarification; comments to not exist in the actual files
Adding this to a find
call:
$ find . -name "f?.html" -exec awk -f regex.awk '{}' \;
./f2.html