Different prediction: Keras vs. Android + DL4J

I experience vastly different prediction results when comparing the output of a neural network trained on a GPU in Python(3.5.5) + Keras (version 2.0.8), against the output of the same neural network on Android (API 24) using DL4J (1.0.0-beta2).

It would be very helpful, if someone can share their experience on how to tackle this problem, thank you!

Importing the model into Android

The neural network was converted to DL4J format by importing it using:

MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(SIMPLE_MLP, false)

and storing it using DL4Js ModelSerializer.

The model is imported into the Android Application using the DL4J method restoreMultiLayerNetwork()

Model Output

The neural network is designed to make a prediction on images, of a fixed input shape: Fixed height, width, 3 channels.

Image preprocessing pipeline in Android:

The image is loaded as an inputstream from the device and stores it in an INDarray:

AndroidNativeImageLoader loader = new AndroidNativeImageLoader(100, 100, 3);

InputStream  inputStream_bitmap = getContentResolver().openInputStream(uri);
INDArray indarray1 = loader.asMatrix(inputStream_bitmap);

AndroidNativeImageLoader() loads and re-scales the image.

The INDarray 'indarray1' is rescaled to contain values in range [0,1]:

indarray1 = indarray1.divi(255);

The INDarray is passed through the network to compute the output:

INDArray output = model.output(indarray1);

Image preprocessing pipeline in Python:

from keras.preprocessing import image
from keras.utils import np_utils
import numpy as np

img = image.load_img(img_path, target_size=(100, 100))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = img.astype('float32')/255

output = model.predict(img)

Problem:

The prediction using Python and Keras differs significantly from the prediction in Android using DL4J. The output is an array of 2 values, each a float in [0,1]. The difference in prediction for a normal .bmp picture taken by a camera is up to 0.99 per element of this output array.

Tests done so far:

When using a monochromatic .bmp image(only red or only blue or only green or completely white), the prediction results are almost the same for both environments. They only differ by 10e-3, which can be explained by training on GPU and applying on CPU.

Conclusion: So far I believe, that the image preprocessing on Android is done differently as in Python, as the model output is the same for monochromatic pictures.

Has someone experienced a similar problem? Any help is much appreciated!

Solution

DL4J and Android uses BGR instead of RGB. Therefore a color format conversion must be performed.

Kudos goes to @saudet from this Github post:

https://github.com/deeplearning4j/deeplearning4j/issues/6495

NativeImageLoader needs to be loaded with this conversion:

loader = new NativeImageLoader(100, 100, 3, new ColorConversionTransform(COLOR_BGR2RGB));