I am following my very first tutorial for making a Classifier with Keras (https://www.tensorflow.org/tutorials/structured_data/preprocessing_layers)
I am following every instruction step by step, but I am using my own dataset.
I have one column ('speed') with Float values.
This is the code proposed by the tutorial to get a normalization layer:
def get_normalization_layer(name, dataset):
# Create a Normalization layer for our feature.
normalizer = preprocessing.Normalization()
# Prepare a Dataset that only yields our feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the statistics of the data.
normalizer.adapt(feature_ds)
return normalizer
Then, it applies this method to their column "PhotoAmt" (number of photos for a pet). I am applying it in the same way but to my "speed" column.
speed_col = train_features['speed']
layer = get_normalization_layer('speed', train_ds)
layer(speed_col)
I understand that their "PhotoAmt" column has Int values.
I get the following error:
/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py:184: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
accumulator.mean * accumulator.count for accumulator in accumulators
Traceback (most recent call last):
File "keras_models.py", line 59, in <module>
layer = get_normalization_layer('speed', train_ds)
File "keras_models.py", line 54, in get_normalization_layer
normalizer.adapt(feature_ds)
File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/engine/base_preprocessing_layer.py", line 188, in adapt
accumulator = self._combiner.compute(data_element, accumulator)
File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py", line 173, in compute
return self.merge([accumulator, sanitized_accumulator])
File "/Users/myname/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/layers/preprocessing/normalization.py", line 184, in merge
accumulator.mean * accumulator.count for accumulator in accumulators
ValueError: operands could not be broadcast together with shapes (5,) (2,)
Whilst the expected output from the tutorial is:
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[ 1.045485 ],
[-1.1339161 ],
[-0.19988704],
[ 0.11145599],
[ 0.42279902]], dtype=float32)>
(of course, I am expecting different numerical values)
I am not understanding the error. Is this problem related to the fact I am using Floats instead of Ints? Or are my column values badly inserted? I am quite sure that none of the rows contain an empty value or similar in the 'speed' column.
I am using TensorFlow 2.2.0, python 3.7
Thank you all.
I'm not sure way, but upgrading tensorflow with:
pip3 install tensorflow --upgrade
solved it.
Or at least, I could skip to the next passage of the tutorial:
# Numeric features.
for header in ['speed']: #and other columns
numeric_col = tf.keras.Input(shape=(1,), name=header)
normalization_layer = get_normalization_layer(header, train_ds)
encoded_numeric_col = normalization_layer(numeric_col)
all_inputs.append(numeric_col)
encoded_features.append(encoded_numeric_col)
which calls the same method as above without getting an error.
Please mind that it is still not working the passage related to "Categorical features as Integers", but I consider myself satisfied since I do not have such features.