So I have 2 pretty large df's and I got some ValueErrors.
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
I checked them one by one(30 columns, oof) and ended up solving the issue, but I wounder if there is any code that can give me the specific column(s) that are preventing the merge... it could be very usefull to me since I am working on other people's dataframes and stuff like this happens all the time.
Example:
df1 (4.000.000,30) df2 (2.000,120)
Join= pd.merge(df1,df2,how='left',on['Name_of_Column1','Name_of_Column2','Name_of_Column3','Name_of_Column4'])
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
They have only some of the columns in common, can I compare the dtypes of the dfs somehow to know what columns have different dtypes?
Thanks in advance!
Assuming you have 2 data frames with similar column names, and are only interested in knowing which columns don't have the same dtype
across both data frames so that you do a merge in peace, you could try the following:
df1 = pd.DataFrame([[1,2,'3',4],
[1,2,'3',4],
[1,2,'3',4]], columns=['a','b','c','d'])
df2 = pd.DataFrame([[5,5,'5','5'],
[5,5,'5','5'],
[5,5,'5','5']], columns=['a','b','c','d'])
Taking these 2 data frames where the 'd'
column has a dtype
mismatch
def dtype_checker(df1, df2):
df1_types = dict(df1.dtypes)
df2_types = dict(df2.dtypes)
for col_name in df1.columns:
assert df1_types[col_name] == df2_types[col_name], 'dtype mismatch in '+col_name+' column'
dtype_checker(df1, df2)
Which will result in the output:
AssertionError: dtype mismatch in d column