I tried Featuretools example mentioned at following URL: https://docs.featuretools.com/index.html
Customers dataframe has following data:
In [4]: customers_df
Out[4]:
customer_id zip_code join_date date_of_birth
0 1 60091 2011-04-17 10:48:33 1994-07-18
1 2 13244 2012-04-15 23:31:04 1986-08-18
After creating a feature matrix for each customer
in the data, there are around 73 features created however, the features/columns join_date
and date_of_birth
are not retained in feature_matrix_customers
.
Query:
1) Is there an option to retain the features/columns join_date
and date_of_birth
in feature_matrix_customers
2) Featuretools DFS does not extract time
from join_date
and does not create any features for hours
, mins
and secs
. Is there a way to have features for hours, mins, secs similar to year
, month
and date
feature columns
To extract other features related to the date, you need to include additional transform primitives to your call to ft.dfs
.
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
features = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=["count", "sum", "mode"],
trans_primitives=["day", "hour", "weekend", "month", "year"],
features_only=True)
I used the features_only
parameter so this only returns feature definitions. The features
variable looks like this now
[<Feature: zip_code>,
<Feature: COUNT(transactions)>,
<Feature: DAY(date_of_birth)>,
<Feature: WEEKEND(join_date)>,
<Feature: COUNT(sessions)>,
<Feature: WEEKEND(date_of_birth)>,
<Feature: HOUR(date_of_birth)>,
<Feature: DAY(join_date)>,
<Feature: MODE(sessions.device)>,
<Feature: SUM(transactions.amount)>,
<Feature: YEAR(join_date)>,
<Feature: HOUR(join_date)>,
<Feature: YEAR(date_of_birth)>,
<Feature: MONTH(join_date)>,
<Feature: MONTH(date_of_birth)>,
<Feature: MODE(transactions.product_id)>,
<Feature: MODE(sessions.MODE(transactions.product_id))>,
<Feature: MODE(sessions.MONTH(session_start))>,
<Feature: MODE(sessions.DAY(session_start))>,
<Feature: MODE(sessions.YEAR(session_start))>,
<Feature: MODE(sessions.HOUR(session_start))>]
Featuretools only returns numeric and categorical features, so we have to manually add the datetime features in like this
features += [ft.Feature(es["customers"]["join_date"]), ft.Feature( es["customers"]["date_of_birth"])]
Now, we can calculate the features on actual data
fm = ft.calculate_feature_matrix(entityset=es, features=features)
this returns which as the join_date
and date_of_birth
at the end of the dataframe
zip_code COUNT(transactions) DAY(date_of_birth) WEEKEND(join_date) COUNT(sessions) WEEKEND(date_of_birth) HOUR(date_of_birth) DAY(join_date) MODE(sessions.device) SUM(transactions.amount) YEAR(join_date) HOUR(join_date) YEAR(date_of_birth) MONTH(join_date) MEAN(transactions.amount) MODE(transactions.product_id) MONTH(date_of_birth) MEAN(sessions.COUNT(transactions)) MODE(sessions.MODE(transactions.product_id)) MEAN(sessions.MEAN(transactions.amount)) MODE(sessions.MONTH(session_start)) MODE(sessions.DAY(session_start)) MEAN(sessions.SUM(transactions.amount)) MODE(sessions.YEAR(session_start)) MODE(sessions.HOUR(session_start)) SUM(sessions.MEAN(transactions.amount)) join_date date_of_birth
customer_id
1 60091 126 18 True 8 False 0 17 mobile 9025.62 2011 10 1994 4 71.631905 4 7 15.750000 4 72.774140 1 1 1128.202500 2014 6 582.193117 2011-04-17 10:48:33 1994-07-18
2 13244 93 18 True 7 False 0 15 desktop 7200.28 2012 23 1986 4 77.422366 4 8 13.285714 3 78.415122 1 1 1028.611429 2014 3 548.905851 2012-04-15 23:31:04 1986-08-18
3 13244 93 21 True 6 False 0 13 desktop 6236.62 2011 15 2003 8 67.060430 1 11 15.500000 1 67.539577 1 1 1039.436667 2014 5 405.237462 2011-08-13 15:42:34 2003-11-21
4 60091 109 15 False 8 False 0 8 mobile 8727.68 2011 20 2006 4 80.070459 2 8 13.625000 1 81.207189 1 1 1090.960000 2014 1 649.657515 2011-04-08 20:08:14 2006-08-15
5 60091 79 28 True 6 True 0 17 mobile 6349.66 2010 5 1984 7 80.375443 5 7 13.166667 3 78.705187 1 1 1058.276667 2014 0 472.231119 2010-07-17 05:27:50 1984-07-28