Search code examples
pythonpython-3.xpandasdataframepython-docx

How to add a Data Frame with indexes using python-docx


I do realize this has already been addressed here (e.g., Writing a Python Pandas DataFrame to Word document, python-docx: Parse a table to Panda Dataframe). Nevertheless, I hope this question was different.

I have used value_counts() and generated a DataFrame as below:

df = sns.load_dataset('tips')

object_cols = list(df.select_dtypes(exclude=['int', 'float', 'int64', 'float64', 'int32', 'float32']).columns)

# Value Count & Percentage for object columns
c = df[object_cols].apply(lambda x: x.value_counts()).T.stack().astype(int)
p = (df[object_cols].apply(lambda x: x.value_counts(normalize=True)).T.stack() * 100).round(2)
cp = pd.concat([c,p], axis=1, keys=['Count', 'Percentage %'])

cp

The DataFrame looks like:

                 Count  Percentage %
sex      Female     87  35.66
         Male      157  64.34
smoker   No        151  61.89
         Yes       93   38.11
day      Fri       19   7.79
         Sat       87   35.66
         Sun       76   31.15
         Thur      62   25.41
time     Dinner   176   72.13
         Lunch     68   27.87

I'm trying to add the above DataFrame as a table in a document using python-docx

import docx 
from docx import Document

doc = Document()
doc.add_paragraph("Value Counts: ")

t = doc.add_table(cp.shape[0]+1, cp.shape[1])

# Set table style
t.style = 'Colorful List Accent 1'

# add the header rows.
for j in range(cp.shape[-1]):
    t.cell(0,j).text = cp.columns[j]

# add the rest of the data frame
for i in range(cp.shape[0]):
    for j in range(cp.shape[-1]):
        t.cell(i+1,j).text = str(cp.values[i,j])
        
filename = "output/ValueCOunts_Report.docx"
# save the docx
doc.save(filename)

I'm able to add the table as

Count   Percentage %
87      35.66
157     64.34
151     61.89
.....
.....
.....

enter image description here

How can I add the complete DataFrame with indexes as a table to document?


Solution

  • This is a little hacky solution as it brings the indexes to columns and manipulates the columns to look like index:

    Reset the index and make use of series.duplicated with np.where to fill the repeating values of the column with blank

    cp = cp.rename_axis(['Attr','Val']).reset_index()
    cp['Attr'] = np.where(cp['Attr'].duplicated(),'',cp['Attr'])
    

    Then executing your code gives the following output:

    enter image description here