Search code examples
featuretools

featuretools: how can I apply `time_since`, `time_since_first` primitives on integer type of time index?


When the time index is integer(e.g. starting from 0 for each user), running dfs shows warnings:

UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  agg_primitives: ['avg_time_between', 'time_since_first', 'time_since_last', 'trend']
  groupby_trans_primitives: ['cum_count', 'time_since', 'time_since_previous']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.

However, the timeindex can be an integer in many cases (e.g. https://www.kaggle.com/c/riiid-test-answer-prediction/data):

  • When user1 started to be logged, timestamp of its first log is 0 (say, 2020-01-01 13:10:10)
  • When 2nd log of user1 to be logged after next 100 seconds, the timestamp of its second log is 100.
  • Likewise, when user2 started to be logged, timestamp of its first log is always 0 (Notice: 2020-11-01 00:00:00, in this time()
  • As you can see here, user1 and user2 started to be logged at different date and time, but they have the same time index. It seems like they use the relative time as a feature.

In this case, even though I set the timestamp variable as ft.variable_types.TimeIndex(numeric_time_index) when creating entityset, it still showed the same warning and features generated by ['avg_time_between', 'time_since_first', 'time_since_last', 'trend'] didn't appear.

How can I handle it?


Solution

  • Thanks for the question. The time_since and time_since_first primitives are currently implemented to handle only Datetime and DatetimeTimeIndex variables. To handle cases where the time index is numeric, you can create custom primitives to handle NumericTimeIndex variables.

    from featuretools.primitives import AggregationPrimitive, TransformPrimitive
    from featuretools.variable_types import NumericTimeIndex
    
    
    class TimeSinceNumeric(TransformPrimitive):
        input_types = [NumericTimeIndex]
        ...
    
    
    class TimeSinceFirstNumeric(AggregationPrimitive):
        input_types = [NumericTimeIndex]
        ...
    

    Then, you can pass in the custom primitives directly to DFS.

    ft.dfs(
        ...
        trans_primitives=[TimeSinceNumeric],
        agg_primitives=[TimeSinceFirstNumeric],
    )