Search code examples
pythondata-ingestionmlopsfeature-storemlrun

Issue with ingest values, 2x more


When I ingested values to the feature set, the pipeline was called 2x more (I used MLRun version 1.2.1). It seems as the issue, do you know why?

I used this code:

import mlrun
import mlrun.feature_store as fstore

# mlrun: start-code
import math

def calc(x):
    x['fn2']=math.sin(x['fn2'])*100.0
    print('calc')
    return x

# mlrun: end-code

mlrun.set_env_from_file("mlrun-nonprod.env")
project = mlrun.get_or_create_project(project_name, context='./', user_project=False)
feature_derived = fstore.get_feature_set(f"{project_name}/{feature_derivedName}")
...
# dataFrm has only two values
feature_derived.graph.to(name="calc", handler='calc')
fstore.ingest(feature_derived, dataFrm)

I got this output (method calc was called four times) for dataFrm with two values:

> calc
> calc 
> calc
> calc

Solution

  • The solution is easy, it is enough to switch-off preview mode based on setting infer_options=0 in ingest method. See part of the code

    ...
    feature_derived.graph.to(name="calc", handler='calc')
    fstore.ingest(feature_derived, dataFrm, infer_options=0)
    ...
    

    The output has only two values (as requested):

    > calc
    > calc