Search code examples
c#docxpage-numberinggembox-document

Get page number from Word document


I'm using GemBox.Document and I need to find out on what page is my bookmark located inside the Word document. Can this be done? If not, then can I find out the page on which some specific text is located?

I can find both bookmark and text, but I don't see any option that lets me get the page number from that.

DocumentModel document = DocumentModel.Load("My Document.docx");
Bookmark bookmark = document.Bookmarks["My Bookmark"];
ContentRange content = document.Content.Find("My Text").First();

Solution

  • This is a somewhat uncommon task for Word files, you see these files themselves do not have a page concept, they are of a flow-document type, the page concept is specific to a Word application which is rendering it (like Microsoft Word).

    The flow-document types (DOC, DOCX, RTF, HTML, etc. formats) define content in a flow-able manner, it's designed for easier editing.
    On the other hand, the fixed-document types (PDF, XPS, etc. formats) have a page concept because the content is fixed, it specifies on which page and on which location some specific content will be rendered, it's designed to be rendered the same when being viewed on any application or any screen.

    Nevertheless, here is how you can obtain the page number from some ContentPosition using GemBox.Document:

    static int GetPageNumber(ContentPosition position)
    {
        DocumentModel document = position.Parent.Document;
    
        Field pageField = new Field(document, FieldType.Page);
        Field importedPageField = position.InsertRange(pageField.Content).Parent as Field;
    
        document.GetPaginator(new PaginatorOptions() { UpdateFields = true });
    
        int pageNumber = int.Parse(importedPageField.Content.ToString());
        importedPageField.Content.Delete();
    
        return pageNumber;
    }
    

    Also, here is how you can use it:

    DocumentModel document = DocumentModel.Load("My Document.docx");
    Bookmark bookmark = document.Bookmarks["My Bookmark"];
    ContentRange content = document.Content.Find("My Text").First();
    
    int bookmarkPageNumber = GetPageNumber(bookmark.Start.Content.Start);
    int contentPageNumber = GetPageNumber(content.Start);
    

    Last, note that the GetPaginator method is a somewhat heavy task (basically, it is similar to saving the whole document to PDF), it can be expensive when you have a rather large document.

    So, if you need to use GetPageNumber multiple times (for example, to find out the page number of each bookmark that you have), then you should consider changing the code so that you first import all the page fields that you need and then call the GetPaginator method just once and then read the content of all those page fields.