Search code examples
azureazure-machine-learning-service

How to modify the input/output schema for an Azure deployment service?


I started to develop machine learning models on The Microsoft Azure Machine Learning Studio service. The tutorials and information related to this service are rather clear but I am looking for some information that I did not find concerning the deployment of the service.

I would like to understand why the input schema requires the definition of the variable to predict and why the output returns all variable fields given in entry. In this response/request exchange a part of information transmitted is useless. I wondering if it is possible to modify manually this schema.

I searched in the configuration tab of the web service panel but I did not find any information to modify the schema passed to the model.

The code below is the input schema that the model requires and the value to predict is WallArea. It is not really useful to pass this variable because it is the one we try to predict. (except if we want to compare the actual value and the predicted one for test purpose).

{
  "Inputs": {
    "input1": {
      "ColumnNames": [
        "WallArea",
        "RoofArea",
        "OverallHeight",
        "GlazingArea",
        "HeatingLoad"
      ],
      "Values": [
        [
          "0",
          "0",
          "0",
          "0",
          "0"
        ]
      ]
    }
  },
  "GlobalParameters": {}
}

The json returned by the model with the predicted value sent all data. It is much more info to what we really need ("Scored Label Mean" and "Scored Label Standard Deviation")

{
  "Results": {
    "output1": {
      "type": "DataTable",
      "value": {
        "ColumnNames": [
          "WallArea",
          "RoofArea",
          "OverallHeight",
          "GlazingArea",
          "HeatingLoad",
          "Scored Label Mean",
          "Scored Label Standard Deviation"
        ],
        "ColumnTypes": [
          "Numeric",
          "Numeric",
          "Numeric",
          "Numeric",
          "Numeric",
          "Numeric",
          "Numeric"
        ],
        "Values": [
          [
            "0",
            "0",
            "0",
            "0",
            "0",
            "0",
            "0"
          ]
        ]
      }
    }
  }
}

My question is how to reduce/synthesize the input/output schema if it is possible and why the variable to predict must be sent with the input schema?


Solution

  • I found the solution.

    For those who have the same problem, it is pretty simple in fact. You need to add two Select Columns in Dataset box in your Predictive experiment schema.

    enter image description here

    Update 2020: Following some updates done on the service, the solution proposed is partially broken. Indeed, if you decide to not include the outcome in the first Select columns box, you well not be able to retrieve it in the second Select Column box leading to an error. To solve that, you have to remove the first Select Column box and take all features. For the second Select Column box nothing change, you select the features you want for your predictive response.