<?xml version="1.0" encoding="UTF-8"?>
<Document>
<InnerDoc>
<GrpHdr>
<MsgId>aaa.xml</MsgId>
<CreDtTm>2023-08-15T13:35:33.0Z</CreDtTm>
<MsgRcpt>
<Id value="111">
<OrgId>
<Othr>
<Id>asa-"-as'#</Id>
</Othr>
</OrgId>
</Id>
</MsgRcpt>
<tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - '</tag1>
<tag2 info = "AddInf2">Report Map = " - '</tag1>
</GrpHdr>
</InnerDoc>
</Document>
For the above xml I need to replace all "
(double quote) to "
and '
(single quote) to '
for eg: <tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - &apos</tag1>
It should replace for text only in xml tag value.So, It should match text between pattern >
and <
. could you please suggest correct sed command for this?
I tried sed command to replace but it's replacing all. I need to do pattern match and consider only text within >
and <
for replacing
Using GNU awk for multi-char RS
and RT
:
$ awk -v RS='>[^<]+<' -v ORS= '{
gsub(/"/,"\\"",RT)
gsub(/\047/,"\\'",RT)
print $0 RT
}' file
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<InnerDoc>
<GrpHdr>
<MsgId>aaa.xml</MsgId>
<CreDtTm>2023-08-15T13:35:33.0Z</CreDtTm>
<MsgRcpt>
<Id value="111">
<OrgId>
<Othr>
<Id>asa-"-as'#</Id>
</Othr>
</OrgId>
</Id>
</MsgRcpt>
<tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - '</tag1>
<tag2 info = "AddInf2">Report Map = " - '</tag1>
</GrpHdr>
</InnerDoc>
</Document>
It's obviously fragile as >
or <
might appear in text or within tag attributes.