Search code examples
pythonswiftscikit-learncoremltools

Converting Sklearn model to Core ML Model for iOS


I have created sklearn model for predicting ESG Scores. But for imputation reasons my input features matrix X were converted to np.ndarray. I found on apple developers website how to convert them, but there is optional kwargs input_features and output_features.

import coremltools
coreml_model = coremltools.converters.sklearn.convert(model,
                                                    ["bedroom", "bath", "size"],
                                                    "price")

coreml_model.save('HousePricer.mlmodel')

I am wondering is it necessary match names of features in X with value I am sending to convert() method with input_features argument?

I.e.

imputer = KNNImputer()

# Impute missing values in X
X_imputed = imputer.fit_transform(X)

X_imputed

Out:

array([[ 5.00000e+00,  1.80000e+04,  3.27000e+08, ..., -2.20000e+07,
         2.64000e+08,  2.64000e+08],
       [ 6.00000e+00,  1.32500e+05,  9.00000e+06, ...,  9.38000e+08,
         1.90000e+07,  1.90000e+07],
       [ 2.00000e+00,  4.00000e+04,  2.37765e+08, ..., -1.88580e+07,
         1.78696e+08,  1.78696e+08],
       ...,
     ])

So there are no labels now.

The question is how to convert it to Core ML Model now, with no labels for input_features.

Unfortunately I don't have access to macOS to try different approaches and test them.

I would be grateful if you showed not only how to convert but also sample usage of the model with Swift.

(Model based on Random Forest Regressor)


Solution

  • There is dedicated method called ct.converters.sklearn.convert for this goal.

    Here is a full example of the conversion:

    from sklearn.linear_model import LinearRegression
    import numpy as np
    import coremltools as ct
    
    # Load data
    X = np.random.rand(10,5).astype(np.ndarray)
    y = np.random.rand(10).astype(np.ndarray)
    
    # Train a model
    model = LinearRegression()
    model.fit(X, y)
    
    # save the scikit-learn model
    coreml_model = ct.converters.sklearn.convert(model)
    coreml_model.save('Predictor.mlmodel')
    

    Printing coreml_model you get:

    In [13]: coreml_model
    Out[13]:
    input {
      name: "input"
      type {
        multiArrayType {
          shape: 5
          dataType: DOUBLE
        }
      }
    }
    output {
      name: "prediction"
      type {
        doubleType {
        }
      }
    }
    predictedFeatureName: "prediction"
    metadata {
      userDefined {
        key: "com.github.apple.coremltools.source"
        value: "scikit-learn==1.1.1"
      }
      userDefined {
        key: "com.github.apple.coremltools.version"
        value: "7.1"
      }
    }