Search code examples
kerastensorflow2.0normalizationvalueerror

Keras Tutorial - error in get normalization layer


I am following my very first tutorial for making a Classifier with Keras (https://www.tensorflow.org/tutorials/structured_data/preprocessing_layers)

I am following every instruction step by step, but I am using my own dataset.

I have one column ('speed') with Float values.

This is the code proposed by the tutorial to get a normalization layer:

def get_normalization_layer(name, dataset):
  # Create a Normalization layer for our feature.
  normalizer = preprocessing.Normalization()

  # Prepare a Dataset that only yields our feature.
  feature_ds = dataset.map(lambda x, y: x[name])

  # Learn the statistics of the data.
  normalizer.adapt(feature_ds)

  return normalizer

Then, it applies this method to their column "PhotoAmt" (number of photos for a pet). I am applying it in the same way but to my "speed" column.

speed_col = train_features['speed']
layer = get_normalization_layer('speed', train_ds)
layer(speed_col)

I understand that their "PhotoAmt" column has Int values.

I get the following error:

/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py:184: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  accumulator.mean * accumulator.count for accumulator in accumulators
Traceback (most recent call last):
  File "keras_models.py", line 59, in <module>
    layer = get_normalization_layer('speed', train_ds)
  File "keras_models.py", line 54, in get_normalization_layer
    normalizer.adapt(feature_ds)
  File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/engine/base_preprocessing_layer.py", line 188, in adapt
    accumulator = self._combiner.compute(data_element, accumulator)
  File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py", line 173, in compute
    return self.merge([accumulator, sanitized_accumulator])
  File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py", line 184, in merge
    accumulator.mean * accumulator.count for accumulator in accumulators
ValueError: operands could not be broadcast together with shapes (5,) (2,) 

Whilst the expected output from the tutorial is:

<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 1.045485  ],
       [-1.1339161 ],
       [-0.19988704],
       [ 0.11145599],
       [ 0.42279902]], dtype=float32)>

(of course, I am expecting different numerical values)

I am not understanding the error. Is this problem related to the fact I am using Floats instead of Ints? Or are my column values badly inserted? I am quite sure that none of the rows contain an empty value or similar in the 'speed' column.

I am using TensorFlow 2.2.0, python 3.7

Thank you all.


Solution

  • I'm not sure way, but upgrading tensorflow with:

    pip3 install tensorflow --upgrade 
    

    solved it.

    Or at least, I could skip to the next passage of the tutorial:

    # Numeric features.
    for header in ['speed']: #and other columns
      numeric_col = tf.keras.Input(shape=(1,), name=header)
      normalization_layer = get_normalization_layer(header, train_ds)
      encoded_numeric_col = normalization_layer(numeric_col)
      all_inputs.append(numeric_col)
      encoded_features.append(encoded_numeric_col)
    

    which calls the same method as above without getting an error.

    Please mind that it is still not working the passage related to "Categorical features as Integers", but I consider myself satisfied since I do not have such features.