I am trying to read 2 xlsx files and trying to append the data to a single xls file. Sample input :
data in f1 :
id na ln
a1 b1 c1
a2 b2 c2
k1 k2 k3
data in f2 :
jd kd rd md nd pd
a1 b1 c1 d1 e1 f1
a2 b2 c2 d2 e2 f2
a3 b3 c3 d3 e3 f3
The first row is the column names. I used the below code to merge based on the columns.
import pandas as pd
df1 = pd.read_excel('f1.xlsx',usecols=[0,1,2],header=None)
df2 = pd.read_excel('f2.xlsx',usecols=[0,4,5],header=None)
merge_df = pd.concat([df1 ,df2])
When I execute this , I am getting the 5 columns , where as I am expecting 3 columns. If I create a new file from f2 with 3 columns and use usecols=[0,1,2] , I am getting 3 columns. What's wrong in using different columns for the merge?
Hi, here is the print output from merge_df
0 1 2 4 5
0 id na ln NaN NaN
1 a1 b1 c1 NaN NaN
2 a2 b2 c2 NaN NaN
3 k1 k2 k3 NaN NaN
0 jd NaN NaN nd pd
1 a1 NaN NaN e1 f1
2 a2 NaN NaN e2 f2
3 a3 NaN NaN e3 f3
df2 :
0 4 5
0 jd nd pd
1 a1 e1 f1
2 a2 e2 f2
3 a3 e3 f3
The columns get the numbers that you use for importing. I.e. df1
will have column "names" 0, 1 and 2. df2
wil have column names 0, 4 and 5.
You want to rename the columns like so:
df2 = df2.rename(columns={4: 1, 5: 2})
This will give you the output
0 1 2
0 id na ln
1 a1 b1 c1
2 a2 b2 c2
3 k1 k2 k3
0 jd nd pd
1 a1 e1 f1
2 a2 e2 f2
3 a3 e3 f3
This is probably still not what you want, since id
, na
and ln
are the headers, because you wrote
The first row is the column names.
You probably want something like
df1 = pd.read_excel('f1.xlsx', usecols=[0, 1, 2], header=0) # Use headers
df2 = pd.read_excel('f2.xlsx', usecols=[0, 4, 5], header=0) # Use headers
df2 = df2.rename(columns={"jd": "id", "nd": "na", "pd":"ln"}) # Adapt headers to match