I have the output of grep in a folder as below,
./Data1/TEST_Data1.xml:<def-query collection="FT_R1Event" count="-1" desc="" durationEnd="1" durationStart="0" durationType="CAL" fromWS="Data1" id="_q1" timeUnit="D">
./Data2/TEST_Data2.xml:<def-query collection="FT_R2Event" count="-1" desc="" durationEnd="2" durationStart="0" durationType="ABS" fromWS="Data2" id="_q1" timeUnit="M">
I want to extract the below followed by some delimiter, say ',' as below,
Data1/TEST_Data1, durationEnd="1", timeUnit="D"
Data2/TEST_Data2, durationEnd="2", timeUnit="M"
Please help me in achieveing this using the basic linux commands.
I would do it using GNU AWK
following way. Let file.txt
content be
./Data1/TEST_Data1.xml:<def-query collection="FT_R1Event" count="-1" desc="" durationEnd="1" durationStart="0" durationType="CAL" fromWS="Data1" id="_q1" timeUnit="D">
./Data2/TEST_Data2.xml:<def-query collection="FT_R2Event" count="-1" desc="" durationEnd="2" durationStart="0" durationType="ABS" fromWS="Data2" id="_q1" timeUnit="M">
then
awk 'BEGIN{OFS=", ";FPAT="(^[^ ]+xml)|((durationEnd|timeUnit)=\"[^\"]+\")"}{gsub(/\.([/]|xml)/, "", $1);print}' file.txt
output
Data1/TEST_Data1, durationEnd="1", timeUnit="D"
Data2/TEST_Data2, durationEnd="2", timeUnit="M"
Explanation: I used FPAT
to extract interesting elements of input, namely these which from start does not contain spaces and are following by xml
or ((durationEnd
or timeUnit
) followed by "
non-"
"
). Then I remove .
followed by /
or xml
(note that .
has to be literal .
so it is escaped). Then I print everything, which is joined by ,
as I set it as output field seperator (OFS
).
Disclaimer: I tested it only with shown samples.
(tested in gawk 4.2.1)