Search code examples
featuretools

How to create interesting values using value combinations from multiple features/columns


I am fairly new to featuretools, and trying to understand if and how one can add interesting values to an entity set generated using multiple features.

For example, I have an entity set with two entities: customers and transactions. Transactions can be debit or credit (c_d) and can occur across different spending categories (tran_category) - restaurants, clothing, groceries, etc.

Thus far, I am able to create interesting values for either of these features but not from a combination of them:

import featuretools as ft

x = ft.EntitySet()

x.entity_from_dataframe(entity_id = 'customers', dataframe = customer_ids, index = cust_id)
x.entity_from_dataframe(entity_id = 'transactions', dataframe = transactions, index = tran_id, time_index = 'transaction_date')

x_rel = ft.Relationship(x['parties']['cust_id'], x['transactions']['cust_id])
x.add_relationship(x_rel)

x['transactions']['d_c'].interesting_values = ['D', 'C']
x['transactions']['tran_category'].interesting_values = ['restaurants', 'clothing', 'groceries']

How can I add an interesting value that combines values from c_d AND tran_category? (i.e. restaurant debits, grocery credits, clothing debits, etc.). The goal is to then use these interesting values to aggregate across transaction amounts, time between transactions, etc., using where_primitives:

feature_matrix, feature_defs = ft.dfs(entityset = x, target_entity = 'customers', agg_primitives = list_of_agg_primitives, where_primitives = list_of_where_primitives, trans_primitives = list_of_trans_primitives, max_depth = 3)

Solution

  • Currently, there is no way to do that.

    One approach would be to create a new column d_c__tran_category that has all the possible combinations of d_c and tran_category and then add interesting values to that column.

    x['transactions']['d_c__tran_category'].interesting_values = ['D_restaurants', 'C_restaurants', 'D_clothing', 'C_clothing','D_groceries', 'C_groceries']