I am using keras.layers.Normalization
for preprocessing a csv dataset returned from make_csv_dataset
. The execution freezes at adapt(ds) call. No output for error, it just executes adapt
forever. I have tried using pandas for normalization, it completed in seconds.
System info:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
def get_data():
columns = features+label
fpath = keras.utils.get_file("iris.csv", origin=url)
ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features+label, batch_size=10, shuffle=True, ignore_errors=True)
return ds
ds = get_data()
ds_features = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))
norm = keras.layers.Normalization(axis=-1)
norm.adapt(ds_features)
print("adapt completed")
You have to set the parameter to num_epochs
to 1 in make_csv_dataset
, since the default value is None
and it causes an infinite loop as stated in the docs:
An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.
Working example:
import tensorflow as tf
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
def get_data():
columns = features+label
fpath = tf.keras.utils.get_file("iris.csv", origin=url)
ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features+label, num_epochs=1, batch_size=10, shuffle=True, ignore_errors=True)
return ds
ds = get_data()
ds_feature = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))
norm = tf.keras.layers.Normalization(axis=-1)
norm.adapt(ds_feature)
print("adapt completed")
adapt completed