Difficulty understanding required matrix layout for mxnet nn

Got a dataframe with columns 2:37 as variables and column one as a binary response variable.

mx.set.seed(1234)
train.x = data.matrix(A3n.df[,2:37])
train.y = A3n.df[,1]

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=12)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=1)
logoutput <- mx.symbol.LogisticRegressionOutput(fc2, name="logoutput")

A1.MXmodel <- mx.model.FeedForward.create(logoutput, X=train.x, y=train.y,
                                     ctx=mx.gpu(), num.round=1000, array.batch.size=100,
                                     learning.rate=0.01, momentum=0.9,  eval.metric=mx.metric.accuracy,
                                     initializer=mx.init.uniform(0.07),
                                     epoch.end.callback=mx.callback.log.train.metric(100))

Leads to error:

Error in mx.io.internal.arrayiter(as.array(data), as.array(label), unif.rnds,  : 
  io.cc:50: Seems X, y was passed in a Row major way, MXNetR adopts a column major convention.
Please pass in transpose of X instead

Just a few days ago I used:

train.x <- t(train.x)

Which fixed the error and yielded a error rate low enough to be believable, but today, it's nearly .50 with no learning. I also tried switching around array.layout to rowmajor/colmajor to no effect.

[16] Train-accuracy=0.460714285714286
[17] Train-accuracy=0.460714285714286
[18] Train-accuracy=0.460714285714286
[19] Train-accuracy=0.460714285714286
[20] Train-accuracy=0.460714285714286
[993] Train-accuracy=0.460714285714286
[994] Train-accuracy=0.460714285714286
[995] Train-accuracy=0.460714285714286
[996] Train-accuracy=0.460714285714286
[997] Train-accuracy=0.460714285714286
[998] Train-accuracy=0.460714285714286
[999] Train-accuracy=0.460714285714286
[1000] Train-accuracy=0.460714285714286

Solution

There are few things you need to change in the call of mx.model.FeedForward.create to make it work:

Remove transpose from train.x
Remove eval.metric=mx.metric.accuracy (or replace it to eval.metric=mx.metric.rmse if you want to see the training progress)
(Optional) Set array.layout = "rowmajor" to indicate that your examples are in rows and features are in columns. Mxnet is smart enough to detect it automatically, but it will remove the nasty message in the output

The final call will look like this:

A1.MXmodel <- mx.model.FeedForward.create(logoutput, X=train.x, y=train.y,
                                      ctx=mx.gpu(), num.round=1000, array.batch.size=100,
                                      learning.rate=0.01, momentum=0.9,
                                      initializer=mx.init.uniform(0.07),
                                      eval.metric=mx.metric.rmse,
                                      array.layout = "rowmajor",
                                      epoch.end.callback=mx.callback.log.train.metric(100))

The thing is that Accuracy as an evaluation metric cannot work with the output of the logistic regression symbol. If you take a look at the example on how Accuracy is calculated (sorry, python only), you will notice that number of elements in each sub-array should be equal to number of classes. But LogisticRegressionOuput produces only 1 output, thus it cannot be directly used with Accuracy metric.

If you still want to use Accuracy metric, then you need to:

Set number of hidden units of fs2 to 2 (number of classes)

fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=2)
Use SoftmaxOutput as the final layer:

logoutput <- mx.symbol.SoftmaxOutput(fc2, name="logoutput")

SoftmaxOutput produces 2 outputs equal to number of units in the hidden layer, thus accuracy will be calculated correctly.

Cheers.