Search code examples
pythonpandasjoinconcatenation

pandas: how to merge columns irrespective of index


I have two dataframes with meaningless index's, but carefully curated order and I want to merge them while preserving that order. So, for example:

>>> df1
   First
a      1
b      3

and

>>> df2 
c       2
d       4

After merging, what I want to obtain is this:

>>> Desired_output
                    First  Second
AnythingAtAll           1       2     # <--- Row Names are meaningless.
SeriouslyIDontCare      3       4     # <--- But the ORDER of the rows must be preserved.

Every version of "join" I've seen requires me to manually reset indices. I thought concat would work, but I get this:

>>> pd.concat( [df1, df2] , axis = 1, ignore_index= True )
     0    1
a  1.0  NaN
b  3.0  NaN
c  NaN  2.0
d  NaN  4.0
# ^ obviously not what I want.

Even when I explicitly declare ignore_index.

How do I "overrule" the indexing and force the columns to be merged with the rows kept in the exact order that I supply them?


Edit: Thanks to the suggestion from jsmart and topsail, you can dereference the indices by directly accessing the values in the column:

df1["second"]=df2["Second"].values
>>> df1
   First  second
a      1       2
b      3       4

^ Solution


Solution

  • This should also work I think:

    df1["second"] = df2["second"].values
    

    It would keep the index from the first dataframe, but since you have values in there such as "AnyThingAtAll" and "SeriouslyIdontCare" I guess any index values whatsoever are acceptable.

    Basically, we are just adding a the values from your series as a new column to the first dataframe.

    Here's a test example similar to your described problem:

    # -----------
    # sample data
    # -----------
    df1 = pd.DataFrame(
    {
        'x': ['a','b'],
        'First': [1,3],
    })
    df1.set_index("x", drop=True, inplace=True)
    df2 = pd.DataFrame(
    {
        'x': ['c','d'],
        'Second': [2, 4],
    })
    df2.set_index("x", drop=True, inplace=True)
    
    
    # ---------------------------------------------
    # Add series as a new column to first dataframe
    # ---------------------------------------------
    df1["Second"] = df2["Second"].values
    

    Result is:

    First Second
    a 1 2
    b 3 4