Search code examples
python-3.xfeaturetools

Featuretools with a single table and the Min primitive gives an error


My environment is:

Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0

and my pandas dataframe looks like:

df
    index  BoxRatio    Thrust  Velocity  OnBalRun  vwapGain
0      1  0.324000  0.615000  1.525000  3.618000  0.416000
1      2  0.938249  0.366377  2.402230  6.393223  2.667106
2      3  0.317000 -0.281000  0.979000  1.489000  0.506000
3      4  0.289000 -0.433000  0.796000  2.081000  0.536000
4      5  1.551115 -0.103734  0.731682  1.752156  0.667016

I have tried the following:

  es = ft.EntitySet('Pattern')
  es.entity_from_dataframe(dataframe=df,
                           entity_id='my_id',
                           index='index')
  def log10(column):
    return np.log10(column)

  Log10 = make_trans_primitive(function=log10,
                               input_types=[Numeric],
                               return_type=Numeric)

  from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)

  feature_matrix, feature_names = ft.dfs(entityset=es, 
                                         target_entity='my_id',
                                         trans_primitives=[Log10])
  print('feature_names:\n')
  for item in feature_names:
    print('  ' + item)

Which gives the following:

feature_names:
<Feature:    + BoxRatio>
<Feature:    + Thrust>
<Feature:    + Velocity>
<Feature:    + OnBalRun>
<Feature:    + vwapGain>
<Feature:    + LOG10(BoxRatio)>
<Feature:    + LOG10(Thrust)>
<Feature:    + LOG10(Velocity)>
<Feature:    + LOG10(OnBalRun)>
<Feature:    + LOG10(vwapGain)>

So far so good... Now if I add the "Min" primitive, I get:

Traceback (most recent call last):
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
    Main()
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
    trans_primitives=[Log10, Min])
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
    features = dfs_object.build_features(verbose=verbose)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
    all_features, max_depth=self.max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
    all_features, entity, max_depth=max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
    new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'

I expected to see the minimum of each column feature (just like the Log10 primitive). Of course I can define my own Min primitive, but I'm hoping there is a simple solution.

Charles


Solution

  • The problem here is that Min is an aggregation primitive, while Log is a transform primitive.

    Aggregation primitives take related instances as an input and output a single value. They are applied across a parent-child relationship in an entity set. For example, Min takes in a list of values and returns the minimum of the list.

    Transform primitives take one or more variables from an entity as an input and output a new variable for that entity. They are applied to a single entity. For example, log takes in a list of values and returns a list of the same length with the log of each item in the input.

    You can read more in the documentation about primitives: https://docs.featuretools.com/automated_feature_engineering/primitives.html