Search code examples
featuretools

How to retain feature columns after running dfs on entity?


I tried Featuretools example mentioned at following URL: https://docs.featuretools.com/index.html

Customers dataframe has following data:

In [4]: customers_df Out[4]: customer_id zip_code join_date date_of_birth 0 1 60091 2011-04-17 10:48:33 1994-07-18 1 2 13244 2012-04-15 23:31:04 1986-08-18

After creating a feature matrix for each customer in the data, there are around 73 features created however, the features/columns join_date and date_of_birth are not retained in feature_matrix_customers.

Query:

1) Is there an option to retain the features/columns join_date and date_of_birth in feature_matrix_customers

2) Featuretools DFS does not extract time from join_date and does not create any features for hours, mins and secs. Is there a way to have features for hours, mins, secs similar to year, month and date feature columns


Solution

  • To extract other features related to the date, you need to include additional transform primitives to your call to ft.dfs.

    import featuretools as ft
    es = ft.demo.load_mock_customer(return_entityset=True)
    
    features = ft.dfs(entityset=es,
                      target_entity="customers",
                      agg_primitives=["count", "sum", "mode"],
                      trans_primitives=["day", "hour", "weekend", "month", "year"],
                      features_only=True)
    

    I used the features_only parameter so this only returns feature definitions. The features variable looks like this now

    [<Feature: zip_code>,
     <Feature: COUNT(transactions)>,
     <Feature: DAY(date_of_birth)>,
     <Feature: WEEKEND(join_date)>,
     <Feature: COUNT(sessions)>,
     <Feature: WEEKEND(date_of_birth)>,
     <Feature: HOUR(date_of_birth)>,
     <Feature: DAY(join_date)>,
     <Feature: MODE(sessions.device)>,
     <Feature: SUM(transactions.amount)>,
     <Feature: YEAR(join_date)>,
     <Feature: HOUR(join_date)>,
     <Feature: YEAR(date_of_birth)>,
     <Feature: MONTH(join_date)>,
     <Feature: MONTH(date_of_birth)>,
     <Feature: MODE(transactions.product_id)>,
     <Feature: MODE(sessions.MODE(transactions.product_id))>,
     <Feature: MODE(sessions.MONTH(session_start))>,
     <Feature: MODE(sessions.DAY(session_start))>,
     <Feature: MODE(sessions.YEAR(session_start))>,
     <Feature: MODE(sessions.HOUR(session_start))>]
    

    Featuretools only returns numeric and categorical features, so we have to manually add the datetime features in like this

    features += [ft.Feature(es["customers"]["join_date"]), ft.Feature( es["customers"]["date_of_birth"])]
    

    Now, we can calculate the features on actual data

    fm = ft.calculate_feature_matrix(entityset=es, features=features)
    

    this returns which as the join_date and date_of_birth at the end of the dataframe

                zip_code  COUNT(transactions)  DAY(date_of_birth)  WEEKEND(join_date)  COUNT(sessions)  WEEKEND(date_of_birth)  HOUR(date_of_birth)  DAY(join_date) MODE(sessions.device)  SUM(transactions.amount)  YEAR(join_date)  HOUR(join_date)  YEAR(date_of_birth)  MONTH(join_date)  MEAN(transactions.amount)  MODE(transactions.product_id)  MONTH(date_of_birth)  MEAN(sessions.COUNT(transactions))  MODE(sessions.MODE(transactions.product_id))  MEAN(sessions.MEAN(transactions.amount))  MODE(sessions.MONTH(session_start))  MODE(sessions.DAY(session_start))  MEAN(sessions.SUM(transactions.amount))  MODE(sessions.YEAR(session_start))  MODE(sessions.HOUR(session_start))  SUM(sessions.MEAN(transactions.amount))           join_date date_of_birth
    customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
    1              60091                  126                  18                True                8                   False                    0              17                mobile                   9025.62             2011               10                 1994                 4                  71.631905                              4                     7                           15.750000                                             4                                 72.774140                                    1                                  1                              1128.202500                                2014                                   6                               582.193117 2011-04-17 10:48:33    1994-07-18
    2              13244                   93                  18                True                7                   False                    0              15               desktop                   7200.28             2012               23                 1986                 4                  77.422366                              4                     8                           13.285714                                             3                                 78.415122                                    1                                  1                              1028.611429                                2014                                   3                               548.905851 2012-04-15 23:31:04    1986-08-18
    3              13244                   93                  21                True                6                   False                    0              13               desktop                   6236.62             2011               15                 2003                 8                  67.060430                              1                    11                           15.500000                                             1                                 67.539577                                    1                                  1                              1039.436667                                2014                                   5                               405.237462 2011-08-13 15:42:34    2003-11-21
    4              60091                  109                  15               False                8                   False                    0               8                mobile                   8727.68             2011               20                 2006                 4                  80.070459                              2                     8                           13.625000                                             1                                 81.207189                                    1                                  1                              1090.960000                                2014                                   1                               649.657515 2011-04-08 20:08:14    2006-08-15
    5              60091                   79                  28                True                6                    True                    0              17                mobile                   6349.66             2010                5                 1984                 7                  80.375443                              5                     7                           13.166667                                             3                                 78.705187                                    1                                  1                              1058.276667                                2014                                   0                               472.231119 2010-07-17 05:27:50    1984-07-28