When using apply in the below way, the values that get passed as "row" are exclusively those from the first row of the dataframe.
df.apply(make_word_file, axis=1)
Oddly, the file name created in the document.save() is correct. newname
has the correct values in row['case_name']. However if I print(row)
it prints the values from the first row.
def make_word_file(row):
for key, value in mapfields.items():
# print(row)
regex1 = re.compile(key)
replace1 = str(row[value])
docx_replace_regex(document, regex1 , replace1)
newname = remove(row['case_name'], '\/:*?"<>|,.')
print(newname)
document.save(datadir + row["datename"] + "_" + row["court"] + "_" + newname + ".docx")
I expected print(row)
to print the values from each row in the dataframe not just the 1st.
EDIT for clarity:
This script is a mail merge which makes .docx word files.
mapfields
is a dict in the format of regex:column name. document
is a docx-python object.
mapfields = {
"VARfname": "First Name",
"VARlname": "Last Name",
}
This ended up being a loop/python-docx issue not a pandas one.
The document
object was being overwritten, leaving nothing for the regex to find after the first one. Loading the document template in the function fixed the issue.
def make_word_file(case_row):
document_template = Document(directory + fname)
document = document_template
for key, value in mapfields.items():
regex1 = re.compile(key)
replace1 = str(case_row[value])
docx_replace_regex(document, regex1 , replace1)
document.save(location + ".docx")