python tensorflow machine-learning keras tensorflow-datasets

Using keras.layers.Normalization for preprocessing, the adapt call freezes

I am using keras.layers.Normalization for preprocessing a csv dataset returned from make_csv_dataset. The execution freezes at adapt(ds) call. No output for error, it just executes adapt forever. I have tried using pandas for normalization, it completed in seconds.

System info:

tensorflow 2.7.0
cuda 11.0
3080ti mobile
i9-10980HK CPU @ 2.40GHz, 3096 Mhz, 8 Core(s), 16 Logical Processor(s) OS Name Microsoft
Windows 11 Home

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]

def get_data():
    columns = features+label
    fpath = keras.utils.get_file("iris.csv", origin=url)
    ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features+label,  batch_size=10, shuffle=True, ignore_errors=True)
    return ds


ds = get_data()
ds_features = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))

norm = keras.layers.Normalization(axis=-1)
norm.adapt(ds_features)

print("adapt completed")

Solution

You have to set the parameter to num_epochs to 1 in make_csv_dataset, since the default value is None and it causes an infinite loop as stated in the docs:

An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.

Working example:

import tensorflow as tf

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]

def get_data():
    columns = features+label
    fpath = tf.keras.utils.get_file("iris.csv", origin=url)
    ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features+label,  num_epochs=1, batch_size=10, shuffle=True, ignore_errors=True)
    return ds


ds = get_data()
ds_feature = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))

norm = tf.keras.layers.Normalization(axis=-1)
norm.adapt(ds_feature)

print("adapt completed")

adapt completed