Essentially I need to translate one excel document into another.
Both sheets are formatted differently, but contain most of the same information - however in sheet 1, some data is formatted in a different way.
For Example "Name" in Sheet 1, and "First Name" "Last Name" in Sheet 2
is it possible to have my script do this for me? What about looking for seperators like a comma to split "Address" into "Street" "city" "state" "zip" -- or is that best done post-translate using Excel tools.
I've been able to read rows directly using openpyxl with code like this:
step = 2
read_start_row = 4
write_start_row = 3
amount_of_rows = 30
for i in range(0, amount_of_rows, step):
#copy from wb1
c = ws1.cell(row=read_start_row+i, column=4)
#paste in ws2
ws2.cell(row=write_start_row+(i/step), column=4, value=c.value)
but not sure where to start when trying to also alter the data.
This is a quick example if just copying the data.
The Example sheet has two columns; 'Name' and 'Address' where;
Name includes "First" and "Last" name separated by space
Address includes "Street" "City" "State" and "Zip" separated by comma
The example code reads the source Excel sheet splits the two columns into the constituents and writes the result back to the destinaton sheet
import pandas as pd
### Read data from source sheet
df = pd.read_excel('source.xlsx', sheet_name='Sheet1')
### Split the necessary columns on the delimiters
df[['First', 'Last']] = df['Name'].str.split(' ', n=1, expand=True) # Delimiter is space
df[['Street', 'City', 'State', 'Zip']] = df['Address'].str.split(', ', n=3, expand=True) # Delimiter is comma
### Drop the now unnecessary columns
df = df.drop(['Name', 'Address'], axis=1)
### Reorder columns, probably not needed in this case but ensures the columns are in correct order.
df = df[['First', 'Last', 'Street', 'City', 'State', 'Zip']]
### Write to the destination sheet, start row 2 (startrow 1), drop index and headers
with pd.ExcelWriter('dest.xlsx', mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
df.to_excel(writer, sheet_name="Sheet1", startrow=1, index=False, header=False)
Resultant dataframe
First Last Street City State Zip
0 Mavis West 421 E DRACHMAN TUCSON AZ 85705-7598
1 John Spurs 100 MAIN ST SEATTLE WA 98104
2 Jack East 105 KROME AVE MIAMI FL 33185 3700
In this example the destination Sheet contains the headers already on row 1 (note this is row 0 for to_excel). For this since writing to an existing sheet which already contains data (Headers) we use 'mode=a' (append) which requires Openpyxl as the engine.
The to_excel write excludes the Headers from the dataframe being they already exist, however if preferred the destination sheet could be empty and the headers written along with the columns.