Search code examples
pythonpandasmergevalueerror

Is there a way to know which columns have different datatypes to be able to merge? Python - Pandas


So I have 2 pretty large df's and I got some ValueErrors.

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

I checked them one by one(30 columns, oof) and ended up solving the issue, but I wounder if there is any code that can give me the specific column(s) that are preventing the merge... it could be very usefull to me since I am working on other people's dataframes and stuff like this happens all the time.

Example:

df1 (4.000.000,30) df2 (2.000,120)


Join= pd.merge(df1,df2,how='left',on['Name_of_Column1','Name_of_Column2','Name_of_Column3','Name_of_Column4'])

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

They have only some of the columns in common, can I compare the dtypes of the dfs somehow to know what columns have different dtypes?

Thanks in advance!


Solution

  • Assuming you have 2 data frames with similar column names, and are only interested in knowing which columns don't have the same dtype across both data frames so that you do a merge in peace, you could try the following:

    df1 = pd.DataFrame([[1,2,'3',4],
                       [1,2,'3',4],
                       [1,2,'3',4]], columns=['a','b','c','d'])
    
    df2 = pd.DataFrame([[5,5,'5','5'],
                       [5,5,'5','5'],
                       [5,5,'5','5']], columns=['a','b','c','d'])
    

    Taking these 2 data frames where the 'd' column has a dtype mismatch

    def dtype_checker(df1, df2):
        df1_types = dict(df1.dtypes)
        df2_types = dict(df2.dtypes)
        for col_name in df1.columns:
            assert df1_types[col_name] == df2_types[col_name], 'dtype mismatch in '+col_name+' column'
    
    dtype_checker(df1, df2)
    

    Which will result in the output:

    AssertionError: dtype mismatch in d column