Search code examples
pandascategorical-datafeaturetools

Unable to add relationship because dtypes don't match in Featuretools


The following error arises when trying to add a relationship between two entities in Featuretools

Unable to add relationship because ID in metadata is Pandas `dtype category` and ID in transactions is Pandas `dtype category`

Note, the Series are not necessarily the same cat.Codes


Solution

  • This error arises because the categories are different between the categorical variables you are trying to relate. In the code example below, all 3 series are categoricals, but only s and s2 have the same dtype.

    import pandas as pd
    from pandas.api.types import is_dtype_equal
    
    s = pd.Series(["a","b","a"], dtype="category")
    s2 = pd.Series(["b","b","a"], dtype="category")
    s3 = pd.Series(["a","b","c"], dtype="category")
    
    is_dtype_equal(s.dtype, s2.dtype) # this is True
    is_dtype_equal(s.dtype, s3.dtype) # this is False
    

    To fix this, you need update your dataframe before loading it into Featuretools to make sure the Pandas Categoricals have the same values category values. Here's how you do that

    if s is missing categories from s3

    new_s = s.astype(s3.dtype)
    is_dtype_equal(new_s.dtype, s3.dtype) # this is True
    

    if both Series are missing categories from the other we must make the union of the categories

    s4 = pd.Series(["b","c"], dtype="category")
    
    categories = set(s.dtype.categories + s4.dtype.categories) # make union of categories
    
    new_s = s.astype("category", categories=categories)
    new_s4 = s4.astype("category", categories=categories)
    
    is_dtype_equal(new_s.dtype, new_s4.dtype) # this is True