Search code examples
pythonpython-3.xpython-docx

python -docx to extract table from word docx


I know this is a repeated question but the other answers did not work for me. I have a word file that consists of one table. I want that table as an output of my python program. I'm using python 3.6 and I have installed python -docx as well. Here is my code for the data extraction

from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

I want the result that exactly looks like the word docx file. Thanks in advance


Solution

  • Your code works fine for me. How about inserting it into a dataframe?

    import pandas as pd
    from docx.api import Document
    
    document = Document('test_word.docx')
    table = document.tables[0]
    
    data = []
    
    keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)
    
        if i == 0:
            keys = tuple(text)
            continue
        row_data = dict(zip(keys, text))
        data.append(row_data)
        print (data)
    
    df = pd.DataFrame(data)
    

    How can i display particular row and column in that table? We can extract rows and cols based on index with iloc

    # iloc[row,columns] 
    df.iloc[0,:].tolist() # [5,6,7,8]  - row index 0
    df.iloc[:,0].tolist() # [5,9,13,17]  - column index 0
    df.iloc[0,0] # 5  - cell(0,0)
    df.iloc[1:,2].tolist() # [11,15,19]  - column index 2, but skip first row
    

    and so on...

    However, if your columns have names (in this case it is numbers) you can do it like this:

    #df["name"].tolist() 
    df[1].tolist() # [5,6,7,8] - column with name 1 
    

    print(df)
    

    prints, which is how the table looks like in my sample doc.

        1   2   3   4
    0   5   6   7   8
    1   9   10  11  12
    2   13  14  15  16
    3   17  18  19  20