Search code examples
feature-engineeringfeaturetools

How to record constants derived by FeatureTools when using Deep Feature Synthesis


When FeatureTools performs deep feature synthesis, is there a way for it to record constant values it has derived?

For example, I have a with many rows like this: | loan_id | loan_term | |---------|:---------:| | a | 12 | | ... | ... | | z | 18 |

DeepFeatureSynthesis engineers features including <Feature: loan_term.COUNT(loan)> as so: | loan | loan_term | loan_term.COUNT(loan) | |---------|:---------:|:---------------------:| | a | 12 | 2000 | | ... | ... | ... | | z | 18 | 800 |

I would like to be able to re-engineer features from a single entity, so that a single loan term of 12 has a loan_term.COUNT(loan) of 2000 without having to re-count all of the loan_terms in the dataframe.*

I could do this by re-combining the entity with with training data ft.calculate_feature_matrix(features, my_entity_set_with_one_new_entity_added), but this is inefficient and slow.

Is there a way to direct FeatureTools to record constants found during deep feature synthesis, and to use them for future feature generation?


*It's not important to me right now to include the single new loan entity in the calculation. So 12 does not have to become 2001.


Solution

  • Unfortunately, there is not a way to do this at this as of Featuretools v0.3.1. You can accomplish this manually by doing the following.

    1. Using the output feature matrix from running on the training data, select the columns you don't want to recalculate like loan_term.COUNT(loan).
    2. Remove the features you selected in 1. from your feature list and running on the new dataset
    3. Join the dataframe from step 1 into the dataframe from step 2 on the appropriate key. In this case loan_term.

    You may have to make some tweaks based on the particulars of your dataset.