Search code examples
pythonfeature-storemlrun

Possible data Ingest count issue in FeatureStore


I see mistake, that count of values in FeatureStore Statistic do not fit with amount of ingested values, see sample

...
project_name = 'test-load'
project = mlrun.get_or_create_project(project_name, context='./', user_project=True)
..
fset = fstore.FeatureSet("test01", entities=['id'])
# ingest 3 values
fstore.ingest(fset, CSVSource("mycsv", path="a1.csv"), overwrite=False)
# ingest 3 values
fstore.ingest(fset, CSVSource("mycsv", path="a2.csv"), overwrite=False)

and I saw only 3 values in statistic see print screen:

FeatureStore statistics

Do you see any issue?


Solution

  • The key is that statistics reflect the data for the last ingestion ONLY. It means, that number of values based on ingestions is without mistakes, you can check total of values based on e.g. FeatureVector, see sample code

    ...
    features = ["test01.F_2"]
    
    vector = fstore.FeatureVector("test_vector",features=features,with_indexes=True)
    resp = fstore.get_offline_features(vector)
    
    # Return values based on vector definition
    resp.to_dataframe()