Search code examples
pythonms-wordlibreofficedoclibreoffice-writer

How to remove row line numbers from several .doc/.docx files on Linux?


I need to remove row line numbers from a large collection of Word .doc/.docx files as part of a (Python) data processing pipeline.

I am aware of solutions to do this in C# using Word.Interop (e.g. Is it possible to use Microsoft.Office.Interop.Word to programatically remove line numbering from a Word document?) but it would be great to achieve this e.g. using LibreOffice in --headless mode (before evaluating MS Word + wine solutions).

For a single file, with the UI, one can follow https://help.libreoffice.org/Writer/Line_Numbering, but I need to do this for a lot of files, so a macro/script/command line solution to

1) cycle through a set of files
2) remove row numbers and save the result to file

and triggered with e.g. a Python subprocess call would be great, or even with calls to the Python API (https://help.libreoffice.org/Common/Scripting).


Solution

  • To perform line removal for a list of files in the working directory (and put the resulting output into pdfs) run LibreOffice in a Linux command line:

    soffice --headless --accept="socket,host=localhost,port=2002;urp;StarOffice.ServiceManager"
    

    and then in the Python interpreter

    import uno
    import socket
    import os
    import subprocess
    from pythonscript import ScriptContext
    from com.sun.star.beans import PropertyValue
    
    # list docfiles in working dir
    files = [x for x in os.listdir('.') if x.endswith(".docx")]
    
    # iterate on files
    for file in files:
    
        localContext = uno.getComponentContext()
        resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
        ctx = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
        smgr = ctx.ServiceManager
        desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx)
    
        # open file 
        model = desktop.loadComponentFromURL(uno.systemPathToFileUrl(os.path.realpath(file)), "_blank", 0, ())
    
        # remove line numbers
        model.getLineNumberingProperties().IsOn = False
    
        # prepare to save output to pdf
        XSCRIPTCONTEXT = ScriptContext(ctx, None, None)
    
        p = PropertyValue()
        p.Name = 'FilterName'
        p.Value = 'writer_pdf_Export'
    
        oDoc = XSCRIPTCONTEXT.getDocument()
    
        # create pdf 
        oDoc.storeToURL("file://" + os.getcwd() + "/" + file + ".pdf", tuple([p]))
    

    This should create pdf files with no line numbering in your working directory.

    Useful links:
    Add line numbers and export to pdf via macro on OpenOffice forums
    LineNumberingProperties documentation
    Info on running a macro from the command line