Search code examples
machine-learningneural-networkdeep-learningmxnetgluon

Defining a simple neural netwok in mxnet error


I am doing making simple NN using MXnet , but having some problem in step() method

x1.shape=(64, 1, 1000)
y1.shape=(64, 1, 10)
net =nm.Sequential()
net.add(nn.Dense(H,activation='relu'),nn.Dense(90,activation='relu'),nn.Dense(D_out))
for t in range(500):
    #y_pred = net(x1)

    #loss = loss_fn(y_pred, y)
    #for i in range(len(x1)):

    with autograd.record():
        output=net(x1)
        loss =loss_fn(output,y1)
    loss.backward()
    trainer.step(64)
    if t % 100 == 99:
        print(t, loss)
        #optimizer.zero_grad()

UserWarning: Gradient of Parameter dense30_weight on context cpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient


Solution

  • The error indicates that you are passing parameters in your trainer that are not in your computational graph. You need to initialize the parameters of your model and define the trainer. Unlike Pytorch, you don't need to call zero_grad in MXNet because by default new gradients are written in and not accumulated. Following code shows a simple neural network implemented using MXNet's Gluon API:

    # Define model
    net = gluon.nn.Dense(1)
    net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)
    square_loss = gluon.loss.L2Loss()
    trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})
    
    # Create random input and labels
    def real_fn(X):
        return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
    
    X = nd.random_normal(shape=(num_examples, num_inputs))
    noise = 0.01 * nd.random_normal(shape=(num_examples,))
    y = real_fn(X) + noise
    
    # Define Dataloader
    batch_size = 4
    train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True)
    num_batches = num_examples / batch_size
    
    for e in range(10):
    
        # Iterate over training batches
        for i, (data, label) in enumerate(train_data):
    
        # Load data on the CPU
            data = data.as_in_context(mx.cpu())
            label = label.as_in_context(mx.cpu())
    
            with autograd.record():
                output = net(data)
                loss = square_loss(output, label)
    
        # Backpropagation
            loss.backward()
            trainer.step(batch_size)
    
            cumulative_loss += nd.mean(loss).asscalar()
    
        print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples))