grep command to find the presence of either quote(") or apostrophe(') in the xml tag value of the file

<?xml version="1.0" encoding="UTF-8"?>
 <Document>
    <InnerDoc>
        <GrpHdr>
            <MsgId>aaa.xml</MsgId>
            <CreDtTm>2023-08-15T13:35:33.0Z</CreDtTm>
            <MsgRcpt>
                    <Id  value="111">
                    <OrgId>
                        <Othr>
                            <Id>asa-"-as'#</Id>
                        </Othr>
                    </OrgId>
                </Id>
            </MsgRcpt>
            <tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - '</tag1>
            <tag2 info = "AddInf2">Report Map =  " - '</tag1>
        </GrpHdr>
    </InnerDoc>
</Document>

In the above XML I need to find whether there is at least one occurrence of either quote (") or apostrophe (') in the XML tag value only.

For example, in

<tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - '</tag1>

grep should evaluate the string between > and < only.

I tried a simple special char search. But it is searching the double quotes of non-XML tag values such as in the header version="1.0". I don't need that, and want to avoid it.

Solution

Joachim Sauer's comment is correct - for example, even the simplest invocation on your test input yields this:

$: xmllint file
file:17: parser error : Opening and ending tag mismatch: tag2 line 17 and tag1
            <tag2 info = "AddInf2">Report Map =  " - '</tag1>
                                                             ^

And it will make it easier to process escape codes.

With his much-appreciated assist:

$: xmllint --xpath "//text()[contains(.,'\"') or contains(., \"'\")]" file
asa-"-as'#Double only: " Single only: '

Still trying to figure a way to get the newlines and maybe line numbers.

That said, what you really want is to find records with single or double quotes in that value space.

$: grep -n $'>[^<]*[\'"][^<]*<' file
11:                            <Id>asa-"-as'#</Id>
16:            <tag1 info = "AddInf1">Report Map = PRIOR DAY BALTRAN INCREMENTAL " - '</tag1>
17:            <tag2 info = "AddInf2">Report Map =  " - '</tag1>

This is going to break if the tag-delimiting characters (< & >) are embedded in the value space, such as < or in a quoted string (which is questionable XML, anyway.)

Note the $'...' construct is a Bash-ism. If that's unavailable you may need a more complicated bit of creative cross-quoting to get both correctly.

$: grep -n '>[^<]*["'"'][^<]*<" file
11:                            <Id>asa-"-as'#</Id>
16:            <tag1 info = "AddInf1">Double only: " </tag1>
17:            <tag2 info = "AddInf2">Single only: ' </tag1>

$: grep -n '>[^<]*['"'"'"][^<]*<' file
11:                            <Id>asa-"-as'#</Id>
16:            <tag1 info = "AddInf1">Double only: " </tag1>
17:            <tag2 info = "AddInf2">Single only: ' </tag1>