I'm trying to join two dataframes - one with multiindex columns and the other with a single column name. They have similar index.
I get the following warning:
"UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)"
For example:
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
np.random.seed(2022) # so the data is the same each time
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df2 = pd.DataFrame(np.random.randn(3), index=['A', 'B', 'C'],columns=['w'])
df3 = df.join(df2)
df
first bar baz foo qux
second one two one two one two one two
A -0.000528 -0.274901 -0.139286 1.984686 0.282109 0.760809 0.300982 0.540297
B 0.373497 0.377813 -0.090213 -2.305943 1.142760 -1.535654 -0.863752 1.016545
C 1.033964 -0.824492 0.018905 -0.383344 -0.304185 0.997292 -0.127274 -1.475886
df2
w
A -1.940906
B 0.833649
C -0.567218
df3
- Result (bar, one) (bar, two) (baz, one) (baz, two) (foo, one) (foo, two) (qux, one) (qux, two) w
A -0.000528 -0.274901 -0.139286 1.984686 0.282109 0.760809 0.300982 0.540297 -1.940906
B 0.373497 0.377813 -0.090213 -2.305943 1.142760 -1.535654 -0.863752 1.016545 0.833649
C 1.033964 -0.824492 0.018905 -0.383344 -0.304185 0.997292 -0.127274 -1.475886 -0.567218
df.join(df2)
from pandas v1.3.0
results in a FutureWarning
FutureWarning: merging between different levels is deprecated and will be removed in a future version. (2 levels on the left, 1 on the right) df3 = df.join(df2)
.What is the best way to join these two dataframes?
It depends on what you want! Do you want the column from df2
to be aligned with the 1st or second level of columns from df
?
You have to add a level to the columns of df2
Super cheezy with pd.concat
df.join(pd.concat([df2], axis=1, keys=['a']))
Better way
df2.columns = pd.MultiIndex.from_product([['a'], df2.columns])
df.join(df2)