Ragged tensors as input for LSTM

Learning about ragged tensors and how can I use them with tensorflow. My example

xx = tf.ragged.constant([
                        [0.1, 0.2],
                        [0.4, 0.7 , 0.5, 0.6]
                        ])
yy = np.array([[0, 0, 1], [1,0,0]])

mdl = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=[None], batch_size=2, dtype=tf.float32, ragged=True),
    tf.keras.layers.LSTM(64),  
    tf.keras.layers.Dense(3, activation='softmax')
])

mdl.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

mdl.summary()
history = mdl.fit(xx, yy, epochs=10)

The error

Input 0 of layer lstm_152 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [2, None]

I am not sure if I can use ragged tensors like this. All examples I found have embedding layer before LSTM, but what I don't want to create additional embedding layer.

Solution

I recommend to use Input layer rather than InputLayer, you often not need to use InputLayer, Anyway the probelm that the shape of your input and LSTM layer input shape was wrong , here the modification i have made with some comments.

# xx should be 3d for LSTM
xx = tf.ragged.constant([
                        [[0.1, 0.2]],
                        [[0.4, 0.7 , 0.5, 0.6]]
                        ])

"""
Labels represented as OneHotEncoding so you 
should use CategoricalCrossentropy instade of SparseCategoricalCrossentropy
"""

yy = np.array([[0, 0, 1], [1,0,0]])

# For ragged tensor , get maximum sequence length
max_seq = xx.bounding_shape()[-1]

mdl = tf.keras.Sequential([
    # Input Layer with shape = [Any,  maximum sequence length]                      
    tf.keras.layers.Input(shape=[None, max_seq], batch_size=2, dtype=tf.float32, ragged=True),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(3, activation='softmax')
])

# CategoricalCrossentropy
mdl.compile(loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

mdl.summary()
history = mdl.fit(xx, yy, epochs=10)