Search code examples
python-3.xsortingms-wordpython-docx

Sort MS Word paragraphs alphabetically with Python


How can i sort MS Word paragraphs alphabetically with python-docx ?

I tried several things but can't get it working. Is somethings like this code bellow could do the work ?

from docx import Document

document = Document()
document.add_paragraph('B - paragraph two' )
document.add_paragraph('A - paragraph one' )

document.paragraphs.sort(key=lambda x: x.text)

document.save('sorted_paragraphs.docx')

Expected result in sorted_paragraphs.docx:

A - paragraph one
B - paragraph two

ie: Is there a way to do the same things that MS word GUI sort does with python ?

The point is to change the position of the paragraphs in the document so they are displayed in alphabetical order based on the paragraphs first letter.


Solution

  • Something like this should do the trick:

    # --- range of paragraphs you want to sort, by paragraph index
    # --- note that the last paragraph (18) is not included, consistent
    # --- with Python "slice" notation.
    start, end = 8, 18
    
    # --- create a sorted list of tuples (pairs) of paragraph-text (the
    # --- basis for the sort) and the paragraph `<w:p>` element for each
    # --- paragraph in range.
    text_element_triples = sorted(
        (paragraph.text, i, paragraph._p)
        for i, paragraph in enumerate(document.paragraphs[start:end])
    )
    
    # --- move each paragraph element into the sorted position, starting
    # --- with the first one in the list
    _, _, last_p = text_element_triples[0]
    
    for _, _, p in text_element_triples[1:]:
        last_p.addnext(p)
        last_p = p