python neural-network deep-learning caffe lmdb

Caffe: Extremely high loss while learning simple linear functions

I'm trying to train a neural net to learn the function y = x1 + x2 + x3. The objective is to play around with Caffe in order to learn and understand it better. The data required are synthetically generated in python and written to memory as an lmdb database file.

Code for data generation:

import numpy as np
import lmdb
import caffe

Ntrain = 100
Ntest = 20
K = 3
H = 1
W = 1

Xtrain = np.random.randint(0,1000, size = (Ntrain,K,H,W))
Xtest = np.random.randint(0,1000, size = (Ntest,K,H,W))

ytrain = Xtrain[:,0,0,0] + Xtrain[:,1,0,0] + Xtrain[:,2,0,0]
ytest = Xtest[:,0,0,0] + Xtest[:,1,0,0] + Xtest[:,2,0,0]

env = lmdb.open('expt/expt_train')

for i in range(Ntrain):
    datum = caffe.proto.caffe_pb2.Datum()
    datum.channels = Xtrain.shape[1]
    datum.height = Xtrain.shape[2]
    datum.width = Xtrain.shape[3]
    datum.data = Xtrain[i].tobytes()
    datum.label = int(ytrain[i])
    str_id = '{:08}'.format(i)

    with env.begin(write=True) as txn:
        txn.put(str_id.encode('ascii'), datum.SerializeToString())


env = lmdb.open('expt/expt_test')

for i in range(Ntest):
    datum = caffe.proto.caffe_pb2.Datum()
    datum.channels = Xtest.shape[1]
    datum.height = Xtest.shape[2]
    datum.width = Xtest.shape[3]
    datum.data = Xtest[i].tobytes()
    datum.label = int(ytest[i])
    str_id = '{:08}'.format(i)

    with env.begin(write=True) as txn:
        txn.put(str_id.encode('ascii'), datum.SerializeToString())

Solver.prototext file:

net: "expt/expt.prototxt"

display: 1
max_iter: 200
test_iter: 20
test_interval: 100

base_lr: 0.000001
momentum: 0.9
# weight_decay: 0.0005

lr_policy: "inv"
# gamma: 0.5
# stepsize: 10
# power: 0.75

snapshot_prefix: "expt/expt"
snapshot_diff: true

solver_mode: CPU
solver_type: SGD

debug_info: true

Caffe model:

name: "expt"


layer {
    name: "Expt_Data_Train"
    type: "Data"
    top: "data"
    top: "label"    

    include {
        phase: TRAIN
    }

    data_param {
        source: "expt/expt_train"
        backend: LMDB
        batch_size: 1
    }
}


layer {
    name: "Expt_Data_Validate"
    type: "Data"
    top: "data"
    top: "label"    

    include {
        phase: TEST
    }

    data_param {
        source: "expt/expt_test"
        backend: LMDB
        batch_size: 1
    }
}


layer {
    name: "IP"
    type: "InnerProduct"
    bottom: "data"
    top: "ip"

    inner_product_param {
        num_output: 1

        weight_filler {
            type: 'constant'
        }

        bias_filler {
            type: 'constant'
        }
    }
}


layer {
    name: "Loss"
    type: "EuclideanLoss"
    bottom: "ip"
    bottom: "label"
    top: "loss"
}

The loss on the test data that I'm getting is 233,655. This is shocking as the loss is three orders of magnitude greater than the numbers in the training and test data sets. Also, the function to be learned is a simple linear function. I can't seem to figure out what is wrong in the code. Any suggestions/inputs are much appreciated.

Solution

The loss generated is a lot in this case because Caffe only accepts data (i.e. datum.data) in the uint8 format and labels (datum.label) in int32 format. However, for the labels, numpy.int64 format also seems to be working. I think datum.data is accepted only in uint8 format because Caffe was primarily developed for Computer Vision tasks where inputs are images, which have RGB values in [0,255] range. uint8 can capture this using the least amount of memory. I made the following changes to the data generation code:

Xtrain = np.uint8(np.random.randint(0,256, size = (Ntrain,K,H,W)))
Xtest = np.uint8(np.random.randint(0,256, size = (Ntest,K,H,W)))

ytrain = int(Xtrain[:,0,0,0]) + int(Xtrain[:,1,0,0]) + int(Xtrain[:,2,0,0])
ytest = int(Xtest[:,0,0,0]) + int(Xtest[:,1,0,0]) + int(Xtest[:,2,0,0])

After playing around with the net parameters (learning rate, number of iterations etc.) I'm getting an error of the order of 10^(-6) which I think is pretty good!