Search code examples
c#xmlms-wordword-automation

C# parsing an xml document to find <w:p> tag with paraId equal to some value


I am using the following code to parse the xml of word document but I am unable to figure out how to do the LINQ query to find the node I need. Basically I have a table so there is a <w:tr> tags and I want to zone in on the one with a paraId equal to "12345" and I need to get the <w:p> tags inside of this <w:tr> node. My code so far is as follows:

                    XDocument xdc = XDocument.Parse(docText);
                    var arrNames = xdc.Root
                    .Descendants("w:tr")
                    .Select(e => e.Attribute("w14:paraId")).ToArray();

I am not very familiar with LINQ or XML so this is a bit tricky for me. The structure of the xml is pretty much as follows:

       <w:tr w:rsidRPr="00DC3742" w:rsidR="009336A8" w:rsidTr="1FCAB362" w14:paraId="570B1706" w14:textId="77777777">
        <w:tc>
           <w:tcPr>
              <w:tcW w:w="10728" w:type="dxa" />
              <w:tcBorders>
                 <w:top w:val="single" w:color="auto" w:sz="4" w:space="0" />
                 <w:left w:val="single" w:color="auto" w:sz="4" w:space="0" />
                 <w:bottom w:val="single" w:color="auto" w:sz="4" w:space="0" />
                 <w:right w:val="single" w:color="auto" w:sz="4" w:space="0" />
              </w:tcBorders>
              <w:tcMar />
           </w:tcPr>
           <w:p w:rsidRPr="00896505" w:rsidR="009336A8" w:rsidP="1FCAB362" w:rsidRDefault="009336A8" w14:paraId="3574A5D8" w14:textId="5A2EB300">
              <w:pPr>
                 <w:pStyle w:val="ListParagraph" />
                 <w:numPr>
                    <w:ilvl w:val="0" />
                    <w:numId w:val="10" />
                 </w:numPr>
                 <w:spacing w:after="0" />
                 <w:rPr>
                    <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                    <w:b w:val="1" />
                    <w:bCs w:val="1" />
                 </w:rPr>
              </w:pPr>
              <w:r w:rsidRPr="1FCAB362" w:rsidR="03C18AEE">
                 <w:rPr>
                    <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                    <w:b w:val="1" />
                    <w:bCs w:val="1" />
                 </w:rPr>
                 <w:t>This actually worked! Take 2</w:t>
              </w:r>
           </w:p>
        </w:tc>
     </w:tr>

So I am basically just trying to target the <w:p> tags within the table row and store them as an array or just serialize them to a string.


Solution

  • Try this:

    var xDoc = XDocument.Parse(docText);
    var root = xDoc.Root;
    
    var w = root.GetNamespaceOfPrefix("w");
    var w14 = root.GetNamespaceOfPrefix("w14");
    
    var xRow = root.Descendants(w + "tr").FirstOrDefault(
        tr => tr.Attribute(w14 + "paraId")?.Value == "12345");
    
    if (xRow != null)
    {
        var xPars = xRow.Descendants(w + "p").ToArray();
        // TODO, use "w:p" elements...
    }