Search code examples
c#ms-wordvstocross-reference

C# VSTO How can can I return the cross references from the Word document


I have developed a tool which iterates over a Word document to extract the text depending on the styles in the document, and insert that text into an XML template.

I need to be able to check if there are cross-references to figures in the paragraphs, and extract the Figure reference, or some way of identifying it so that i can reproduce the cross reference in the XML document.

After much searching i cannot find any information which can do this. Lots of info on inserting references, but not retrieving them from the document.

I've played with the following code which passes in each paragraph of the Word document to see if it contains any fields, but not sure where to go from here. Any ideas please?

private void checkParaForCrossReferences(word.Paragraph eachPara)
{
     var fields = eachPara.Range.Fields;

     foreach (var field in fields)
     {
          //some code to get the cross reference information (figure or table number, caption or ID or something)

     }
}

Solution

  • Word uses REF fields for cross references, so to get the cross-references for a paragraph the code would look something like this

    Word.Range rng = null;
    
    foreach (para In doc.Paragraphs)
    {
        rng = para.Range;
        foreach (fld In rng.Fields)
        {
            if (fld.Type = Word.WdFieldType.wdFieldRef)
            {
                Debug.Print("Code: " + fld.Code.Text + "; Result: " + fld.Result.Text);
            }
        }
    }
    

    The tricky part, here, is to what the REF field actually refers. When a cross-reference is inserted to anything but a Bookmark Word automatically assigns a bookmark to that range in the document. The bookmark names start with an underscore _ followed by Ref and a number. These are hidden on the page and in the Bookmarks dialog box by default. A typical REF field code: REF _Ref1571107

    So there is no way from just the field code to determine what kind of cross-reference this is. Depending on which option was selected when inserting the cross-reference to a Figure, it might be possible to get it from the Result. For example, if the cross-reference displays the entire caption or "only label and number" then it will contain the string Figure, which is fairly straight-forward.

    Should neither of these be the case and one of the other three options was chosen, it's possible to extract the bookmark name from the field code and look up the bookmark in the document text and derive the information from that Range - exactly how will depend on the individual document and how the figures were referenced.

    I don't have a C# environment running, at the moment, but the basic VB-code to look up a bookmark name from a REF field:

    sBkmName = Mid(fld.code, InStr(fld.code, "_Ref"), 11) 'a bookmark name is 11 characters long
    Debug.Print ActiveDocument.Bookmarks(sBkmName).Range.Text