Search code examples
rmachine-learningneural-networkrecurrent-neural-networksoftmax

Conversion of output activation with Softmax produces similar values


I trained a simple reccurent network (62 input units, 124 hidden/context units, 62 output units) to predict the subsequent word in a sentence. I used the sigmoid activation function. For some various and strange reasons, it was not possible to apply softmax during training. My professor suggested, that I could apply softmax afterwards to the networks output. The matrix has 576 rows and 62 coloumns. So I implemented softmax in R in the following way:

softmax <- function(outVec = NULL){
  s.vec <- exp(outVec)/sum(exp(outVec))
  return(s.vec)
}

softmax_complete <- function(vec = NULL){
  softmaxed.vec <- matrix(apply(vec, 1, softmax), ncol=dim(vec)[2], nrow=dim(vec)[1])
  return(softmaxed.vec)
}

Each row of the matrix which softmax_complete() returns, sums correctly to 1. The problem is, that for each row, my function producing values which are very similar to each other. Its not possible to validate the networks performance because the values are almost "the same".

Here is some small example data. Its from the second row of the networks output. Softmax was not applied yet.

output.vec <- c(0.2371531427, 0.0085829534, 0.0007576860, 0.0027021256, 0.0025776778, 0.0014593119, 0.0019006504, 0.0403518006,
                0.0024586972, 0.0517364480, 0.0012057235, 0.0950696915, 0.0025749709, 0.0008823058, 0.0005064047, 0.0014039490,
                0.0013259919, 0.0014723240, 0.0011820868, 0.0011805159, 0.0009319001, 0.0022884205, 0.0023589570, 0.0020189525,
                0.0015377736, 0.0937648788, 0.0012874968, 0.0443032309, 0.0012919122, 0.0897148922, 0.0022041877, 0.0444274731,
                0.0014143962, 0.0361100733, 0.0020817134, 0.0447632931, 0.0009620183, 0.0011552101, 0.0016173105, 0.0016870035,
                0.0011272663, 0.0019183536, 0.0017270016, 0.0011056620, 0.0007743868, 0.0026786255, 0.0019340677, 0.0010532230,
                0.0014585924, 0.0386148430, 0.0012295874, 0.0390544645, 0.0017903288, 0.0967107117, 0.0013074477, 0.0006164946,
                0.0001758277, 0.0001023397, 0.0004014068, 0.0004558225, 0.0003554984, 0.0001830685)

When I apply softmax to that row I get the following results:

[1] 0.01585984 0.01583950 0.01567646 0.01583540 0.01735750 0.01579704 0.01587178 0.01589101 0.01586093 0.01590457
[11] 0.01586255 0.01637181 0.01590217 0.01584308 0.01570456 0.01581733 0.01952223 0.01590497 0.01970620 0.01578586
[21] 0.01610417 0.01591379 0.01588095 0.01584309 0.01567710 0.01582956 0.01650858 0.01573117 0.01589502 0.01608836
[31] 0.01574208 0.01582079 0.01584367 0.01569571 0.01583481 0.01596172 0.01577246 0.01586151 0.01605467 0.01574746
[41] 0.01586397 0.01581472 0.01576479 0.01580914 0.01583660 0.01566672 0.01584366 0.02017004 0.01585517 0.02007705
[51] 0.01580263 0.01583277 0.01580424 0.01583763 0.01587117 0.01568283 0.01583775 0.01595945 0.01587471 0.01575585
[61] 0.01584288 0.01577770

The row sum is 1.000703. And for another row which is not depicted in the sample above, the row sum is 0.9976472. What am I doing wrong?

Maybe you guys have an idea to fix that issue. Thank you in advance for your time and help :-)

regards, Matthias

EDIT:

This is how I create the Elman Net by RSNNS:

elman<-rsnnsObjectFactory(subclass=c("elman"), nInputs=inputNeurons, maxit=maxIterations, 
                     initFunc="JE_Weights", initFuncParams=c(0.15, -0.15, 0, 1.0, 0.5), 
                     learnFunc="JE_BP", learnFuncParams=c(learnRate),
                     updateFunc="JE_Order", updateFuncParams=c(0),
                     shufflePatterns=FALSE, computeIterativeError=FALSE) 
  elman$archParams <- list(size=hiddenNeurons)
  elman$snnsObject$elman_createNet(c(inputNeurons, hiddenNeurons, outputNeurons),c(1,1,1),FALSE)
  elman$snnsObject$initializeNet(c(0.15, -0.15, 0, 1.0, 0.5), initFunc="JE_Weights") 
  elman$snnsObject$setUnitDefaults(1,0,1,0,1,"Act_Logistic","Out_Identity")
  elman$snnsObject$setTTypeUnitsActFunc("UNIT_INPUT", "Act_Logistic")
  elman$snnsObject$setTTypeUnitsActFunc("UNIT_SPECIAL_H", "Act_Identity")
  elman$snnsObject$setTTypeUnitsActFunc("UNIT_OUTPUT", "Act_Logistic")

Solution

  • A more concise version of softmax would be:

    softmax <- function(x){
      score.exp <- exp(x)
      probs <-sweep(score.exp, 1, rowSums(score.exp), '/')
      return(probs)
    }
    

    which you could now pass the matrix directly in. Now, looking at the vector you provided.

    res <- softmax(matrix(output.vec, nrow=1))
    sum(res)
    [1] 1
    

    However, it still appears that there isn't much difference in your values. It appears to me that for this particular sample this isn't much information provided by your RNN. According to this, the most likely 'class' is the first class at probability of 2%.

    I would recommend trying it across you entire dataset using the function above.

    This is all assuming many things on your implementation of the neural net. It would be helpful if you could provide a reference to the software you used and at least what parameters you were setting.