python tensorflow keras tensorflow-datasets

Model was constructed with shape (None, 65536) but it was called on an input with incompatible shape (None, 65536, None)

For reference the full error is here:

WARNING:tensorflow:Model was constructed with shape (None, 65536) for input KerasTensor(type_spec=TensorSpec(shape=(None, 65536), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (None, 65536, None).

I am using kymatio to classify audio signals. Before constructing the model I use tensorflow's tf.keras.utils.audio_dataset_from_directory to create the training and testing sets.

The audio samples are of shape (65536,) before the sets are created. To create the sets I use the following code:

T = 2**16
J = 8
Q = 12
log_eps = 1e-6
SEED = 42

train_dataset = tf.keras.utils.audio_dataset_from_directory(
    '../train',
    labels='inferred',
    label_mode='int',
    class_names=['x', 'y', 'z', 'xy', 'xz', 'yz', 'xyz'],
    batch_size=32,
    output_sequence_length=T,
    ragged=False,
    shuffle=True,
    seed=SEED,
    follow_links=False
)

The element_spec of the train_dataset is (TensorSpec(shape=(None, 65536, None), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None)).

So at some point the shape is changing in the TensorSpec to (None, 65536, None) for some reason...

The model is constructed as follows and the error points to model.fit(...).

x_in = layers.Input(shape=(T))
x = Scattering1D(J, Q=Q)(x_in)
x = layers.Lambda(lambda x: x[..., 1:, :])(x)
x = layers.Lambda(lambda x: tf.math.log(tf.abs(x) + log_eps))(x)
x = layers.GlobalAveragePooling1D(data_format='channels_first')(x)
x = layers.BatchNormalization(axis=1)(x)
x_out = layers.Dense(7, activation='softmax')(x)
model = tf.keras.models.Model(x_in, x_out)
model.summary()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_dataset, epochs=50)

Solution

Check the docs regarding tf.keras.utils.audio_dataset_from_directory:

[...] audio has shape (batch_size, sequence_length, num_channels)

Just use tf.squeeze to remove the additional dimension if you are only working on single channel audios:

train_dataset = train_dataset.map(lambda x, y: (tf.squeeze(x, axis=-1), y))

If you want to keep the dimension, try:

x_in = layers.Input(shape=(T, 1))

I would recommend going through this tutorial.