I have two DataFrames with the same number of rows - df1
like so:
date hour var1
a 2017-05-01 00:00:00 456585
b 2017-05-01 01:00:00 899875
c 2017-05-01 02:00:00 569566
d 2017-05-01 03:00:00 458756
e 2017-05-01 04:00:00 231458
f 2017-05-01 05:00:00 986545
and df2
like so:
MyVar1 MyVar2
0 6169.719338 3688.045368
1 5861.148007 3152.238704
2 5797.053347 2700.469871
3 5779.102340 2730.471948
4 6708.219647 3181.298291
5 8550.380343 3793.580394
I want to merge the data from the date
and hour
columns of df1
into df2
, to get a result like:
MyVar1 MyVar2 date hour
0 6169.719338 3688.045368 2017-05-01 00:00:00
1 5861.148007 3152.238704 2017-05-01 01:00:00
2 5797.053347 2700.469871 2017-05-01 02:00:00
3 5779.102340 2730.471948 2017-05-01 03:00:00
4 6708.219647 3181.298291 2017-05-01 04:00:00
5 8550.380343 3793.580394 2017-05-01 05:00:00
I tried simply assigning the columns like so:
df2['date'] = df1['date']
df2['hour'] = df1['hour']
but I get a result with NaN values in the date
and hour
columns instead:
MyVar1 MyVar2 date hour
0 6169.719338 3688.045368 NaN NaN
1 5861.148007 3152.238704 NaN NaN
2 5797.053347 2700.469871 NaN NaN
Why does this happen? How can I simply assign the values such that the data from the first row of df1
is shown in the first row of df2
, etc.?
Your DataFrames' indexes are different (and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.
Consider the following examples to understand what this means:
# Setup
A = pd.DataFrame(index=['a', 'b', 'c'])
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])
C = pd.DataFrame(index=[1, 2, 3])
# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
a
b b (overlap)
c c (overlap)
d
f
# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
a
b
c
1
2
3
When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.
If you're working on an IPython notebook, you can check that this is indeed the root cause using,
df1.index.equals(df2.index)
# False
df1.index.intersection(df2.index).empty
# True
You can use any of the following solutions to fix this issue.
This solution will only work if the lengths of the two DataFrames match (i.e. have same number of rows).
df2['date'] = df1['date'].to_numpy()
# (In pandas < 0.24, do df2['date'] = df1['date'].values )
To assign multiple columns easily, use,
df2[['date', 'hour']] = df1[['date', 'hour']].to_numpy()
For when the two DataFrames have different lengths. Also maybe preferable if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.
# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]
If you want to keep the existing index, but as a column, you may use reset_index()
instead.