Search code examples
pythontensorflowmachine-learningmatconvnet

Initialize TensorFlow CNN model with Numpy weight matrices


I am working on manually converting a pretrained matconvnet model to a tensorflow model. I have pulled the weights/biases from the matconvnet model mat file using scipy.io and obtained numpy matrices for the weights and biases.

Code snippets where data is a dictionary returned from scipy.io:

for i in data['net2']['layers']:
    if i.type == 'conv':
        model.append({'weights': i.weights[0], 'bias': i.weights[1], 'stride': i.stride, 'padding': i.pad, 'momentum': i.momentum,'lr': i.learningRate,'weight_decay': i.weightDecay})

...

weights = {
    'wc1': tf.Variable(model[0]['weights']), 
    'wc2': tf.Variable(model[2]['weights']),
    'wc3': tf.Variable(model[4]['weights']),
    'wc4': tf.Variable(model[6]['weights'])
}

...

Where model[0]['weights'] are the 4x4x60 numpy matrices pulled from matconvnet model for for layer, for example. And this is how I define the place holder for the 9x9 inputs.

X = tf.placeholder(tf.float32, [None, 9, 9]) #also tried with [None, 81] with a tf.reshape, [None, 9, 9, 1]

Current Issue: I cannot get ranks to match up. I consistently getValueError:

ValueError: Shape must be rank 4 but is rank 3 for 'Conv2D' (op: 'Conv2D') with input shapes: [?,9,9], [4,4,60]  

Summary

  • Is it possible to explicitly define a tensorflow model's weights from numpy arrays?
  • Why is the rank for my weight matrices 4? Should my numpy array be something more like [?, 4, 4, 60], and can I make it that way?

Unsuccessfully Attempted:

  • Rotating numpy matrices: I know that matlab and python have different indexing, (0 based indexing vs 1 based, and row major vs column major). Even though I believe I have converted everything appropriately, I still have experimented using libraries like np.rot90() changing 4x4x60 array to 60x4x4.
  • Using tf.reshape: I have attempted to use tf.reshape on the weights after wrapping them with a tf.Variable wrapper, but I get Variable has no attribute 'reshape'

NOTE: Please note, I am aware that there are a number of scripts to go from matconvnet to caffe, and caffe to tensorflow (as described here, for example, https://github.com/vlfeat/matconvnet/issues/1021). My question is related to tensorflow weight initialization options:


Solution

  • I got over this hurdle with tf.reshape(...) (instead of calling weights['wc1'].reshape(...) ). I am still not certain about the performance yet, or if this is a horribly naive endeavor.

    UPDATE Further testing, this approach appears to be possible at least functionally (as in I have created a TensorFlow CNN model that will run and produce predictions that appear consistent with MatConvNet model. I make no claims on accuracies between the two).

    I am sharing my code. In my case, it was a very small network - and if you are attempting to use this code for your own matconvnet to tensorflow project, you will likely need much more modifications: https://github.com/melissadale/MatConv2TensorFlow