I have been searching for this error but did not get anything related to np.setdiff1d with type error. It would really help is you could let me know why this error and how I can resolve it. Below is my sample code snippet -
import pandas as pd
import numpy as np
data1 = {'a' : [32,156], 'b' :[56,177]}
data2 = {'c' : [12,32,12,45,32,45], 'd' :[11,56,76,43,44,45], 'e': [111,156,176,143,144,145], 'f':[411,456,476,443,444,445] }
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
## converting to array
npdf1= df1.to_records(index=False)
npdf2= df2.to_records(index=False)
diff = np.setdiff1d(npdf1,npdf2[['c','e']])
# Above line gave error "TypeError: Cannot compare structured arrays unless they have a common dtype. I.e. `np.result_type(arr1, arr2)` must be defined."
# npdf1 >> gives below
# rec.array([( 32, 56), (111, 177)],
# dtype=[('a', '<i8'), ('b', '<i8')])
# npdf2[['c','e']] >> gives below
# rec.array([(12, 111), (32, 156), (12, 176), (45, 143), (32, 144),
# (45, 145)],
# dtype={'names': ['c', 'e'], 'formats': ['<i8', '<i8'], 'offsets': [0, 16], 'itemsize': 32})
## Above the format is matching i8 but still not sure why the error.
## So as a work round I thought to converted the record arrays to normal numpy arrays
npdf1 = np.array(npdf1)
df2a = df2[['c','e']]
npdf2a = df2a.to_records(index=False)
npdf2a = np.array(npdf2a)
diff = np.setdiff1d(npdf1,npdf2a)
# Still get the error "TypeError: Cannot compare structured arrays unless they have a common dtype. I.e. `np.result_type(arr1, arr2)` must be defined."
Your recarray, converted to a list, is a list of tuples, which can be made into a set:
In [152]: npdf1
Out[152]:
rec.array([( 32, 56), (156, 177)],
dtype=[('a', '<i8'), ('b', '<i8')])
In [153]: npdf1.tolist()
Out[153]: [(32, 56), (156, 177)]
In [154]: s1=set(npdf1.tolist())
In [155]: s1
Out[155]: {(32, 56), (156, 177)}
similarly for 2 fields of the other frame. tolist
removes the field names:
In [159]: s2=set(npdf2[['c','e']].tolist())
And then the ordinary set differences:
In [160]: s1.difference(s2)
Out[160]: {(32, 56), (156, 177)}
In [161]: s2.difference(s1)
Out[161]: {(12, 111), (12, 176), (32, 144), (32, 156), (45, 143), (45, 145)}
import numpy.lib.recfunctions as rf
has various functions to play with recarray (and structured arrays), including structured_to_unstructured
and rename_fields
. But I don't think those are needed here.