Search code examples
c#.netms-wordopenxmlopenxml-sdk

Detecting bold, underline text and headings in word document with openxml sdk in c#


I would like to detect different styles and heading in selected word document with the openxml sdk in C#. This is what I have:

 public string getWordDescription(string path)
        {
            string text = "";
            System.IO.Packaging.Package wordPackage = Package.Open(path, FileMode.Open, FileAccess.Read);
            using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(wordPackage))
            {              
                Body body = wordDocument.MainDocumentPart.Document.Body;
                if (body != null)
                {
                    foreach (Paragraph par in body.Descendants<Paragraph>())
                    {                                              
                        text += par.InnerText;
                        text += "<br />";                                              
                    }
                }
            }
            return text;
        }

So while looping the paragraphs I would like to somehow detect if there is some styling applied or if the paragraph is heading.


Solution

  • Remember that Paragraphs are blocks and blocks are made up of runs. So in your foreach loops you'll need an inner loop to loop through the runs. Run elements container RunProperty elements that describe the inline formatting. Or I believe you can only use one loops like the following

    foreach (Run run in body.Descendants<Run>())
    {
       RunProperties props = run.Descendants<RunProperties>();
       if(props.Descendants<Bold>().First() != null) 
       {
          //then do something
       }
    
       text += run.Text;
       text += "<br />";                                              
    }