I am fairly new to featuretools, and trying to understand if and how one can add interesting values to an entity set generated using multiple features.
For example, I have an entity set with two entities: customers and transactions. Transactions can be debit or credit (c_d) and can occur across different spending categories (tran_category) - restaurants, clothing, groceries, etc.
Thus far, I am able to create interesting values for either of these features but not from a combination of them:
import featuretools as ft
x = ft.EntitySet()
x.entity_from_dataframe(entity_id = 'customers', dataframe = customer_ids, index = cust_id)
x.entity_from_dataframe(entity_id = 'transactions', dataframe = transactions, index = tran_id, time_index = 'transaction_date')
x_rel = ft.Relationship(x['parties']['cust_id'], x['transactions']['cust_id])
x.add_relationship(x_rel)
x['transactions']['d_c'].interesting_values = ['D', 'C']
x['transactions']['tran_category'].interesting_values = ['restaurants', 'clothing', 'groceries']
How can I add an interesting value that combines values from c_d AND tran_category? (i.e. restaurant debits, grocery credits, clothing debits, etc.). The goal is to then use these interesting values to aggregate across transaction amounts, time between transactions, etc., using where_primitives:
feature_matrix, feature_defs = ft.dfs(entityset = x, target_entity = 'customers', agg_primitives = list_of_agg_primitives, where_primitives = list_of_where_primitives, trans_primitives = list_of_trans_primitives, max_depth = 3)
Currently, there is no way to do that.
One approach would be to create a new column d_c__tran_category
that has all the possible combinations of d_c
and tran_category
and then add interesting values to that column.
x['transactions']['d_c__tran_category'].interesting_values = ['D_restaurants', 'C_restaurants', 'D_clothing', 'C_clothing','D_groceries', 'C_groceries']