Search code examples
azureazure-machine-learning-service

How to replace missing datapoints with prior in MS Azure?


When using the ML-pipeline designer in MS Azure it is possible to clean missing data, namely by replacing them by means or constant values.

In my dataset I have gaps, when the measured value did not change enough, thus I should want to replace the missing data with the last existing entry. So from

VALUE A
2
NONE
NONE
NONE
3
NONE
NONE

I would like to get

VALUE A
2
2
2
2
3
3
3

This option is not available in the pipeline designer as far as I know. Can I manipulate the dataset somehow else within Azure, before training?


Solution

  • I figured it out, by using the Notebooks (do not work in Firefox for me, only on Chrome). There it is possible to handle the dataset in python, transform it to pandas, manipulate it and save it to the datastore.