machine-learning neural-network deep-learning mxnet

mxnet training not progressing

Thanks in advance for any help.

I am having some issues getting an mxnet model to converge to anything: it seems stuck close to its initial weights.

A working example (although I have struggled to get many such models working today). I have tried the approach below with a range of epochs (up to 100), and a range of learning rates (0.001 to 10), and cannot get anything sensible out of this.

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
ac3 = mx.sym.Activation(data=fc3, act_type='relu')

output = mx.sym.FullyConnected(data=ac3, num_hidden=1)
loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.1},
          eval_metric='mse',
          num_epoch=5)

gives rise to:

INFO:root:Epoch[0] Train-mse=0.221155
INFO:root:Epoch[0] Time cost=0.173
INFO:root:Epoch[1] Train-mse=0.225179
INFO:root:Epoch[1] Time cost=0.176
INFO:root:Epoch[2] Train-mse=0.225179
INFO:root:Epoch[2] Time cost=0.179
INFO:root:Epoch[3] Train-mse=0.225179
INFO:root:Epoch[3] Time cost=0.176
INFO:root:Epoch[4] Train-mse=0.225179
INFO:root:Epoch[4] Time cost=0.183

where it's clear the training isn't really progressing.

Solution

I took your code and updated it a bit, and was able to make it converge, code is pasted below.

Updates I made: I Updated the layers, to have only two fully connected layers, with 128 units each, updated the loss function to use the built in Linear Regression, added Momentum and updated the learning rate, and lastly - running more epochs

Hope this helps!

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=128)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

output = mx.sym.FullyConnected(data=ac2, num_hidden=1)
#loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")
loss = mx.sym.LinearRegressionOutput(data=output, label=label, name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
          eval_metric='mse',
          num_epoch=50)

Results:

INFO:root:Epoch[0] Train-mse=0.076923
INFO:root:Epoch[0] Time cost=0.148
INFO:root:Epoch[1] Train-mse=0.061155
INFO:root:Epoch[1] Time cost=0.178
INFO:root:Epoch[2] Train-mse=0.061154
INFO:root:Epoch[2] Time cost=0.168
INFO:root:Epoch[3] Train-mse=0.061153
INFO:root:Epoch[3] Time cost=0.151
INFO:root:Epoch[4] Train-mse=0.061151
INFO:root:Epoch[4] Time cost=0.182
INFO:root:Epoch[5] Train-mse=0.061150
INFO:root:Epoch[5] Time cost=0.186
INFO:root:Epoch[6] Train-mse=0.061149
INFO:root:Epoch[6] Time cost=0.197
INFO:root:Epoch[7] Train-mse=0.061147
INFO:root:Epoch[7] Time cost=0.174
INFO:root:Epoch[8] Train-mse=0.061145
INFO:root:Epoch[8] Time cost=0.148
INFO:root:Epoch[9] Train-mse=0.061142
INFO:root:Epoch[9] Time cost=0.150
INFO:root:Epoch[10] Train-mse=0.061140
INFO:root:Epoch[10] Time cost=0.145
INFO:root:Epoch[11] Train-mse=0.061136
INFO:root:Epoch[11] Time cost=0.135
INFO:root:Epoch[12] Train-mse=0.061133
INFO:root:Epoch[12] Time cost=0.136
INFO:root:Epoch[13] Train-mse=0.061128
INFO:root:Epoch[13] Time cost=0.137
INFO:root:Epoch[14] Train-mse=0.061122
INFO:root:Epoch[14] Time cost=0.146
INFO:root:Epoch[15] Train-mse=0.061116
INFO:root:Epoch[15] Time cost=0.135
INFO:root:Epoch[16] Train-mse=0.061108
INFO:root:Epoch[16] Time cost=0.152
INFO:root:Epoch[17] Train-mse=0.061098
INFO:root:Epoch[17] Time cost=0.179
INFO:root:Epoch[18] Train-mse=0.061086
INFO:root:Epoch[18] Time cost=0.160
INFO:root:Epoch[19] Train-mse=0.061069
INFO:root:Epoch[19] Time cost=0.151
INFO:root:Epoch[20] Train-mse=0.061050
INFO:root:Epoch[20] Time cost=0.145
INFO:root:Epoch[21] Train-mse=0.061024
INFO:root:Epoch[21] Time cost=0.164
INFO:root:Epoch[22] Train-mse=0.060990
INFO:root:Epoch[22] Time cost=0.151
INFO:root:Epoch[23] Train-mse=0.060944
INFO:root:Epoch[23] Time cost=0.141
INFO:root:Epoch[24] Train-mse=0.060881
INFO:root:Epoch[24] Time cost=0.136
INFO:root:Epoch[25] Train-mse=0.060790
INFO:root:Epoch[25] Time cost=0.124
INFO:root:Epoch[26] Train-mse=0.060658
INFO:root:Epoch[26] Time cost=0.151
INFO:root:Epoch[27] Train-mse=0.060455
INFO:root:Epoch[27] Time cost=0.166
INFO:root:Epoch[28] Train-mse=0.060131
INFO:root:Epoch[28] Time cost=0.148
INFO:root:Epoch[29] Train-mse=0.059582
INFO:root:Epoch[29] Time cost=0.219
INFO:root:Epoch[30] Train-mse=0.058581
INFO:root:Epoch[30] Time cost=0.160
INFO:root:Epoch[31] Train-mse=0.056593
INFO:root:Epoch[31] Time cost=0.178
INFO:root:Epoch[32] Train-mse=0.052252
INFO:root:Epoch[32] Time cost=0.184
INFO:root:Epoch[33] Train-mse=0.042274
INFO:root:Epoch[33] Time cost=0.168
INFO:root:Epoch[34] Train-mse=0.023321
INFO:root:Epoch[34] Time cost=0.162
INFO:root:Epoch[35] Train-mse=0.005860
INFO:root:Epoch[35] Time cost=0.161
INFO:root:Epoch[36] Train-mse=0.000848
INFO:root:Epoch[36] Time cost=0.164
INFO:root:Epoch[37] Train-mse=0.000319
INFO:root:Epoch[37] Time cost=0.176
INFO:root:Epoch[38] Train-mse=0.000221
INFO:root:Epoch[38] Time cost=0.148
INFO:root:Epoch[39] Train-mse=0.000163
INFO:root:Epoch[39] Time cost=0.199
INFO:root:Epoch[40] Train-mse=0.000123
INFO:root:Epoch[40] Time cost=0.141
INFO:root:Epoch[41] Train-mse=0.000096
INFO:root:Epoch[41] Time cost=0.133
INFO:root:Epoch[42] Train-mse=0.000078
INFO:root:Epoch[42] Time cost=0.144
INFO:root:Epoch[43] Train-mse=0.000065
INFO:root:Epoch[43] Time cost=0.174
INFO:root:Epoch[44] Train-mse=0.000056
INFO:root:Epoch[44] Time cost=0.208
INFO:root:Epoch[45] Train-mse=0.000050
INFO:root:Epoch[45] Time cost=0.152
INFO:root:Epoch[46] Train-mse=0.000045
INFO:root:Epoch[46] Time cost=0.154
INFO:root:Epoch[47] Train-mse=0.000041
INFO:root:Epoch[47] Time cost=0.151
INFO:root:Epoch[48] Train-mse=0.000039
INFO:root:Epoch[48] Time cost=0.177
INFO:root:Epoch[49] Train-mse=0.000036
INFO:root:Epoch[49] Time cost=0.135