Search code examples
pythonpython-3.xpandaspython-docxpandas-apply

Apply on Dataframe passes first row values to all rows


When using apply in the below way, the values that get passed as "row" are exclusively those from the first row of the dataframe.

df.apply(make_word_file, axis=1)

Oddly, the file name created in the document.save() is correct. newname has the correct values in row['case_name']. However if I print(row) it prints the values from the first row.

def make_word_file(row):
    for key, value in mapfields.items():
#         print(row)
        regex1 = re.compile(key)
        replace1 = str(row[value])
        docx_replace_regex(document, regex1 , replace1)

    newname = remove(row['case_name'], '\/:*?"<>|,.')
    print(newname)
    document.save(datadir + row["datename"] + "_" + row["court"] + "_" + newname + ".docx")

I expected print(row) to print the values from each row in the dataframe not just the 1st.

EDIT for clarity:

This script is a mail merge which makes .docx word files. mapfields is a dict in the format of regex:column name. document is a docx-python object.

mapfields = {
"VARfname": "First Name",
"VARlname": "Last Name",
}

Solution

  • This ended up being a loop/python-docx issue not a pandas one.

    The document object was being overwritten, leaving nothing for the regex to find after the first one. Loading the document template in the function fixed the issue.

    def make_word_file(case_row):
        document_template = Document(directory + fname)
        document = document_template
        for key, value in mapfields.items():
            regex1 = re.compile(key)
            replace1 = str(case_row[value])
            docx_replace_regex(document, regex1 , replace1)
    
        document.save(location + ".docx")