Search code examples

Extract bullets from word document using aspose.words in C#

I need to extract the text with the bullet style from a word document in C#. I am using the aspose.words library but a solution with a different library is also welcome. I can already upload documents and extract the text with heading1 styling. but when I try the same with the bullet styling I get nothing.

I am using the code below to get the text with Heading1 styling and that works.

var heading1 = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1);
foreach (var head1 in heading1)

I am trying to use the code below to get the text with bullet styling and this does NOT work.

var bullets = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.ListBullet);
foreach (var bullet in bullets)

I also tried using the listbullet1,2,3,4 and 5 styleIdentifiers but that also does not fix the problem.


  • I am now using this to succesfully extract the list items from a word file and put them into a listbox.

           string fileName = listBox1.Items.Cast<string>().FirstOrDefault();
                    // Open the document.
                    Document doc = new Document(fileName);
                    NodeCollection paras = doc.GetChildNodes(NodeType.Paragraph, true);
                    // Find if we have the paragraph list. In our document, our list uses plain Arabic numbers,
                    // which start at three and ends at six.
                    foreach (Aspose.Words.Paragraph paragraph in paras.OfType<Aspose.Words.Paragraph>().Where(p => p.ListFormat.IsListItem))
                        //listBox19.Items.Add($"List item paragraph #{paras.IndexOf(paragraph)}");
                        // This is the text we get when getting when we output this node to text format.
                        // This text output will omit list labels. Trim any paragraph formatting characters. 
                        string paragraphText = paragraph.ToString(SaveFormat.Text).Trim();
                        //remove the dot in front of the bullet
                        string bullet = paragraphText.Remove(0, 2);
                        ListLabel label = paragraph.ListLabel;