Search code examples
pythonpandasgraphlab

How can one replace missing values with median or mode in SFrame?


I'm going through the Graphlab documentation and I am trying to figure out how to duplicate the pandas functionality were na values are replaced by the median, the mean, or the mode, etc... In pandas you simply do this by: df.dropna().median() or df.dropna().mean() etc....

But the documentation on the dropna and fillna functions for SFrame don't mention anything similar. Is it possible at all in SFrame?


Solution

  • There is one, but only the mean is available, not the median. Have a look at: graphlab.toolkits.feature_engineering.NumericImputer (doc)

    Impute missing values with feature means.

    Input columns to the NumericImputer must be of type int, float, dict, list, or array.array. For each column in the input, the transformed output is a column where the input is retained as is if:

    • there is no missing value.

    Inputs that do not satisfy the above are set to the mean value of that feature.

    If the median is what you want, you could achieve it with:

    data.fillna('feature_name', np.median(data['feature_name']))