Search code examples
pythonkeraschainer

How to Create a Trained Keras Model Through Setting the Weights


I am trying to convert a trained chainer model into a trained keras model in hopes of converting it into coreml. My attempt at doing so is through directly setting the weights of an instantiated keras model with the same architecture as that of the chainer model. Through debugging, I noted that the shape of the weight matrices are transposed when setting them in Keras. The issue is that the ouputs of the two models differ. In the keras model, the first layer gets some of the outputs correct, but most are zeroed out in an unpredictable fashion. Are there other parameters to a trained keras model that i'm missing?

import chainer
import cv2 as cv
import numpy as np
import argparse

import sys
import os

import evaluation_util
from keras.layers import merge, Convolution2D, Input

sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
import projection_gan

import keras
from keras.layers import Dense, Input, Activation
from keras.models import Model
from keras.utils import plot_model

def create_keras_model():
    inputs = Input(shape=(34,))

    l1 = Dense(1024, activation='relu')(inputs)
    l2 = Dense(1024, activation='relu')(l1)
    l3 = Dense(1024)(l2)
    l3 = keras.layers.add([l1,l3])
    l3 = Activation('relu')(l3)
    l4 = Dense(17)(l3)

    model = Model(inputs=inputs, outputs=l4)
    return model

def main(args):
    model = evaluation_util.load_model(vars(args))
    chainer.serializers.load_npz(args.lift_model, model)
    keras_model = create_keras_model()
    plot_model(keras_model, to_file='model.png')
    weights_list = [model.l1.W.array.transpose(), model.l1.b.array,
                    model.l2.W.array.transpose(), model.l2.b.array,
                    model.l3.W.array.transpose(), model.l3.b.array,
                    model.l4.W.array.transpose(), model.l4.b.array]
    keras_model.set_weights(weights_list)
    keras_model.save("keras.h5")

Sample Output from the first layer:

Chainer (correct model):

0.012310047, -0.0038410246, 0.019623855, 0.01872946, -0.010116328, ...

Keras:

0.012310054, 0.0, 0.0, 0.01872946, 0.0, ...


Solution

  • In keras, layer is defined together with the activation function. While chainer L.Linear layer is only for linear operation, without any activation function.

    As you define first layer as l1 = Dense(1024, activation='relu')(inputs), this is the linear operation followed by relu operation, which converts negative value to 0.

    That is why your keras model's first layer output have non-negative value.

    I guess weights itself is ok.