Search code examples
pythonpython-3.xpandasnumpypython-docx

To count the rows and its values in word docx by using python


I have a word docx which consist no of tables. Each table has different rows and columns name but among all one row name is same in all different table that is "test automation", It has the values of "yes or no" . Here my question is how can i count the total no of "test automation" Rows values like this "TOTAL NO OF TEST AUTOMATION:yes=200,no=100" I'm using python 3.6. Am new to python please help me. My sample code for the table extraction and specific column extraction.

Image of sample data: Sample dataenter image description here

my code looks like this to extract the docx table

import pandas as pd
from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

df = pd.DataFrame(data)
print(df)

Solution

  • This is the essential logic you need to count Yes values for test automation. You'll need to deal with any Pandas manipulations you need:

    from docx import Document
    
    def table_test_automation(table):
        for row in table.rows:
            row_heading = row.cells[0].text
            if row_heading != 'Test automation':
                continue
            yes_no = row.cells[3].text
            return 1 if yes_no == 'Yes' else 0
    
        return 0
    
    
    document = Document('test_word.docx')
    yes_count = 0
    for table in document.tables:
        yes_count += table_test_automation(table)
    print(yes_count)