The following error arises when trying to add a relationship between two entities in Featuretools
Unable to add relationship because ID in metadata is Pandas `dtype category` and ID in transactions is Pandas `dtype category`
Note, the Series are not necessarily the same cat.Codes
This error arises because the categories are different between the categorical variables you are trying to relate. In the code example below, all 3 series are categoricals, but only s
and s2
have the same dtype.
import pandas as pd
from pandas.api.types import is_dtype_equal
s = pd.Series(["a","b","a"], dtype="category")
s2 = pd.Series(["b","b","a"], dtype="category")
s3 = pd.Series(["a","b","c"], dtype="category")
is_dtype_equal(s.dtype, s2.dtype) # this is True
is_dtype_equal(s.dtype, s3.dtype) # this is False
To fix this, you need update your dataframe before loading it into Featuretools to make sure the Pandas Categoricals have the same values category values. Here's how you do that
if s
is missing categories from s3
new_s = s.astype(s3.dtype)
is_dtype_equal(new_s.dtype, s3.dtype) # this is True
if both Series are missing categories from the other we must make the union of the categories
s4 = pd.Series(["b","c"], dtype="category")
categories = set(s.dtype.categories + s4.dtype.categories) # make union of categories
new_s = s.astype("category", categories=categories)
new_s4 = s4.astype("category", categories=categories)
is_dtype_equal(new_s.dtype, new_s4.dtype) # this is True