Search code examples
cntk

CNTK classification model Classifies all 1


I have a cntk model which takes in features related to clicks and other information and predicts if something would be clicked in the future. Using the same features in a randomforest works fine, however, cntk classifies all 1. Why does this happen? Is there any parameter tuning needed? The features have varying scale. My train action looks like this:

    BrainScriptNetworkBuilder = [
    inputD = $inputD$
    labelD = $labelD$
    #hidden1 = $hidden1$
    model(features) = {
        w0 = ParameterTensor{(1 : 2), initValueScale=10}; b0 = ParameterTensor{1, initValueScale=10};
        h1 = w0*features + b0; #hidden layer
        z = Sigmoid (h1)
    }.z
    features = Input(inputD)
    labels = Input(labelD)

    z = model(features)
    #now that we have output, find error
    err = SquareError (labels, z)
    lr = Logistic (labels, z)
    output = z

    criterionNodes = (err)
    evaluationNodes = (err)
    outputNodes = (z)
]

SGD = [
    epochSize = 4 #learn
    minibatchSize = 1 #learn
    maxEpochs = 1000 #learn
    learningRatesPerSample = 1
    numMBsToShowResult = 10000
    firstMBsToShowResult = 10
]

Solution

  • In addition to what KeD said, a random forest does not care about the actual values of the features, only about their relative order.

    Unlike trees, neural networks are sensitive to the actual values of the features (rather than just their relative order).

    Your input might contain some features with very large values. You should probably recode them. There are different schemes for doing this. One possibility is to subtract the mean from each feature and scale it to -1,1 or divide by it's standard deviation. Another possibility for positive features is a transformation such as f => log(1+f). You could also use a batch normalization layer.