I have a huge text file filled with HTML attributes. I only want the value of the tag. Ex:
<option value="API" datatype="string" datatype_value="0">API</option>
<option value="Account" datatype="string" datatype_value="0">Account</option>
<option value="Address - asn" datatype="string" datatype_value="0">Address - asn</option>
I only want "API" after 'option value'.
Right now I have this:
awk -F "option value=" '{print $2}' /inputFilePath | awk '{print $1}'
I works but ONLY on the first line of the file. So my out put when I run the command above on the file only returns:
"API"
And not "Account", "Address" or anything after.
Any thoughts on anything I could be doing wrong? Thanks in advance!
Modify RS instead:
awk 'BEGIN { RS = "<option value=\"" ; FS = "\""; } NF { print $1 }' file
Output:
API
Account
Address - asn
I just hope it works with your awk
as nawk
doesn't.
Yet another using GNU awk:
gawk '{ t = $0; while (match(t, /<option value="([^"]*)"(.*)/, a)) { print a[1]; t = a[2] } }' file
Explicitly I used [^"]*
since I find empty values still valid for your query but you can change that to [^"]+
if preferred.