I'm writing a notebook using this data from Kaggle. Here's a screenshot of the two tables just to show we have ID columns in both.
Here's my code when trying to set up the Entity Set and add a relationship.
import featuretools as ft
import pandas as pd
es = ft.EntitySet()
es = es.add_dataframe(dataframe=train_sampled, index='new_index', dataframe_name='application', make_index=True)
es = es.add_dataframe(dataframe=bureau, index='new_index', dataframe_name='bureau', make_index=True)
new_relationship = ft.Relationship(entityset=es,parent_dataframe_name='application',parent_column_name='SK_ID_CURR',
child_dataframe_name='bureau',child_column_name='SK_ID_CURR')
es = es.add_relationship(new_relationship)
And here's the error I'm getting that doesn't make any sense.
KeyError: 'DataFrame <Relationship: bureau.SK_ID_CURR -> application.SK_ID_CURR> does not exist in entity set'
The Entityset exists but just can't add a relationship, which is the whole point of this.
Any advice or guidance is much appreciated.
EDIT: Solution This code uses the answer below plus changes the index column in the bureau table to the correct one that is unique.
es = ft.EntitySet()
es = es.add_dataframe(dataframe=train_sampled, index='SK_ID_CURR', dataframe_name='application', make_index=False)
es = es.add_dataframe(dataframe=bureau, index='SK_ID_BUREAU', dataframe_name='bureau', make_index=False)
new_relationship = ft.Relationship(entityset=es,parent_dataframe_name='application',parent_column_name='SK_ID_CURR',
child_dataframe_name='bureau',child_column_name='SK_ID_CURR')
es = es.add_relationship(relationship=new_relationship)
If you are adding a relationship to an EntitySet
by passing in a Relationship
object, you need to make sure to use the relationship
keyword in your call like this:
es.add_relationship(relationship=new_relationship)
Without using the relationship
keyword, the method is expecting that you are passing in four values indicating parent_dataframe_name
, parent_column_name
, child_dataframe_name
, child_column_name
. Using this approach you could alternatively skip creating the Relationship
object and add the relationship like this:
es.add_relationship('application', 'SK_ID_CURR', 'bureau', 'SK_ID_CURR')
Finally, you can also use the EntitySet.add_relationships
method to add your relationship, which allows you to add one or more relationships to an EntitySet
by passing in a list of Relationship
objects:
es.add_relationships([new_relationship])
For more details on all of these methods and the expected arguments, you can always refer to the Featuretools API Reference