Search code examples
tensorflowmachine-learningkeraskeras-layer

Why Keras Lambda-Layer cause problem Mask_RCNN?


I'm using the Mask_RCNN package from this repo: https://github.com/matterport/Mask_RCNN.

I tried to train my own dataset using this package but it gives me an error at the beginning.

2020-11-30 12:13:16.577252: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-30 12:13:16.587017: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-11-30 12:13:16.587075: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (7612ade969e5): /proc/driver/nvidia/version does not exist
2020-11-30 12:13:16.587479: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-30 12:13:16.593569: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
2020-11-30 12:13:16.593811: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1b2aa00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-30 12:13:16.593846: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
  File "machines.py", line 345, in <module>
    model_dir=args.logs)
  File "/content/Mask_RCNN/mrcnn/model.py", line 1837, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/content/Mask_RCNN/mrcnn/model.py", line 1934, in build
    anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 926, in __call__
    input_list)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1117, in _functional_construction_call
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 904, in call
    self._check_variables(created_variables, tape.watched_variables())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 931, in _check_variables
    raise ValueError(error_str)
ValueError: 
The following Variables were created within a Lambda layer (anchors)
but are not tracked by said layer:
  <tf.Variable 'anchors/Variable:0' shape=(1, 261888, 4) dtype=float32>
The layer cannot safely ensure proper Variable reuse across multiple
calls, and consquently this behavior is disallowed for safety. Lambda
layers are not well suited to stateful computation; instead, writing a
subclassed Layer is the recommend way to define layers with
Variables.

I looked up the part of code responsible for the problem (located at file: /mrcnn/model.py line: 1935 in the repo): IN[0]: anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)

If anyone have an idea how to solve it or have already solved it, please mention the solution.


Solution

  • ROOT CAUSE: The bahavior of Lambda layer of Keras in Tensorflow 2.X was changed from Tensorflow 1.X. In Keras in Tensorflow 1.X, all tf.Variable and tf.get_variable are automatically tracked into the layer.weights via variable creator context so they receive gradient and trainable automatically. Such approach has problem with auto graph compilation that convert Python code into Execution Graph in Tensorflow 2.X so it is removed and now Lambda layer has the code to check for variable creation and raise the error as you see. In short, Lambda layer in Tensorflow 2.X has to be stateless. If you want to create variable, the correct way in Tensorflow 2.X is to subclass layer class and add trainable weight as a class member.

    SOLUTIONS: There are 2 choices -

    1. Change to use Tensorflow 1.X.. This error will not be raised.

    2. Replace the Lambda layer with subclass of Keras Layer:

    class AnchorsLayer(tensorflow.keras.layers.Layer):
    
       def __init__(self, anchors):
         super(AnchorLayer, self).__init__()
         self.anchors_v = tf.Variable(anchors)
       
       def call(self):
         return self.anchors_v
    
    # Then replace the Lambda call with this:
       
       anchors_layer = AnchorLayers(anchors)
       anchors = anchors_layer()