Search code examples
pythonpandasdataframe

Writing a Python Pandas DataFrame to Word document


I'm working on creating a Python generated report that uses Pandas DataFrames. Currently I am using the DataFrame.to_string() method. However this writes to the file as a string. Is there a way for me to achieve this while keeping it as a table so I can use table formating.

Code:

SEMorgkeys = client.domain_organic(url, database = "us", display_limit = 10, export_columns=["Ph,Pp,Pd,Nq,Cp,Ur,Tr"])
org_df = pd.DataFrame(SEMorgkeys)

f = open(name, 'w')
f.write("\nOrganic:\n")
f.write(org_df.to_string(index=False,justify="left"))
f.close()

Current Printout (as string):

CPC    Keyword                        Position Difference Previous Position Search Volume Traffic (%) Url                                               
75.92       small business factoring   0                   1                 210          11.69       https://www..com/small-business-f...
80.19              factoring company   0                   8                1600           5.72       https://www..com/factoring-vs-ban...

Solution

  • You can write the table straight into a .docx file using the python-docx library.

    If you are using the Conda or installed Python using Anaconda, you can run the command from the command line:

    conda install python-docx --channel conda-forge
    

    Or to pip install from the command line:

    pip install python-docx
    

    After that is installed, we can use it to open the file, add a table, and then populate the table's cell text with the data frame data.

    import docx
    import pandas as pd
    
    # i am not sure how you are getting your data, but you said it is a
    # pandas data frame
    df = pd.DataFrame(data)
    
    # open an existing document
    doc = docx.Document('./test.docx')
    
    # add a table to the end and create a reference variable
    # extra row is so we can add the header row
    t = doc.add_table(df.shape[0]+1, df.shape[1])
    
    # add the header rows.
    for j in range(df.shape[-1]):
        t.cell(0,j).text = df.columns[j]
    
    # add the rest of the data frame
    for i in range(df.shape[0]):
        for j in range(df.shape[-1]):
            t.cell(i+1,j).text = str(df.values[i,j])
    
    # save the doc
    doc.save('./test.docx')