tensorflow opencv keras deep-learning conv-neural-network

Error in google colabe! ValueError: No loss found. You may have forgotten to provide a `loss` argument in the `compile()` method

I'm new to programming, I've literally just started learning right now and I'm doing it with various free tools, so I don't understand much about programming yet

I'm trying to write a neural network for self-learning

The meaning is as follows: I have 3 files. In the first (category) there is 1 column with 37 values and the column name is category

The second (ex) has 2 columns. The first column, called categ, contains 785 rows. The second column, called fix, contains 785 rows

In the third file (match), 1 column called match contains 3543 lines.

I need the match file to get a second column and add a value from the categ file based on data from the excel file to each of its values.

At the moment, I have this code

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.utils import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Reading downloaded Excel files
# File with categories
from google.colab import files
upload = files.upload()
!ls
df_categories = pd.read_excel(open('categ.xlsx', 'rb'))
df_categories = pd.read_excel('categ.xlsx', index_col=None)
print(df_categories.columns)

# File with examples
from google.colab import files
upload = files.upload()
!ls
df_examples = pd.read_excel(open('ex.xlsx', 'rb'))
df_examples = pd.read_excel('ex.xlsx', index_col=None)
print(df_examples.columns)

# File with values for distribution
from google.colab import files
upload = files.upload()
!ls
df_to_distribute = pd.read_excel(open('match.xlsx', 'rb'))
df_to_distribute = pd.read_excel('match.xlsx', index_col=None)
print(df_to_distribute.columns)

# Data preprocessing

categories = df_categories['categ'].tolist()
values = df_examples['fix'].tolist()
to_distribute = df_to_distribute['match'].tolist()

categories = [str(category) for category in categories]
values = [str(value) for value in values]
to_distribute = [str(item) for item in to_distribute]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(categories + values + to_distribute)

tokenizer = Tokenizer()
tokenizer.fit_on_texts(categories + values + to_distribute)
category_sequences = tokenizer.texts_to_sequences(categories)
value_sequences = tokenizer.texts_to_sequences(values)
to_distribute_sequences = tokenizer.texts_to_sequences(to_distribute)

max_length = max(len(seq) for seq in category_sequences + value_sequences + to_distribute_sequences)
padded_category_sequences = pad_sequences(category_sequences, maxlen=max_length, padding='post')
padded_value_sequences = pad_sequences(value_sequences, maxlen=max_length, padding='post')
padded_to_distribute_sequences = pad_sequences(to_distribute_sequences, maxlen=max_length, padding='post')

# Creating a model
input_layer = Input(shape=(max_length,))
embedding_layer = Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=64)(input_layer)
lstm_layer = LSTM(64)(embedding_layer)
output_layer = Dense(36, activation='softmax')(lstm_layer)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Model training
#model.fit(padded_to_distribute_sequences, padded_category_sequences, epochs=10, batch_size=32, validation_split=0.2)
#model.fit(np.array(data_list), np.array(y), verbose=0, epochs=100)
model.fit(np.array(padded_to_distribute_sequences), np.array(padded_category_sequences), verbose=0, epochs=100)

At the moment I'm getting the following error and I don't know how to fix it

ValueError                                Traceback (most recent call last)
<ipython-input-18-4e982bc70a7f> in <cell line: 37>()
     35 #model.fit(padded_to_distribute_sequences, padded_category_sequences, epochs=10, batch_size=32, validation_split=0.2)
     36 #model.fit(np.array(data_list), np.array(y), verbose=0, epochs=100)
---> 37 model.fit(np.array(padded_to_distribute_sequences), np.array(padded_category_sequences), verbose=0, epochs=100)

1 frames
/usr/local/lib/python3.10/dist-packages/keras/src/engine/data_adapter.py in _check_data_cardinality(data)
   1958             )
   1959         msg += "Make sure all arrays contain the same number of samples."
-> 1960         raise ValueError(msg)
   1961 
   1962 

ValueError: Data cardinality is ambiguous:
  x sizes: 3549
  y sizes: 36
Make sure all arrays contain the same number of sample

I've tried changing lines of code based on recommendations from websites and forums, but it hasn't helped yet. I will be glad of your help!

I'm writing code in Google colab

Unfortunately, I cannot discard the original files that I use, since they contain personal data, but I can share a brief summary so that the logic of my actions is clear. I attach it at the end of the description

an example of the files I use

Solution

I think the problem is that the target data padded_category_sequences and the input data padded_to_distribute_sequences have different numbers of samples, which causes a ValueError.

Add this after the "Data Processing":-

target_data = np.tile(padded_category_sequences, (len(padded_to_distribute_sequences) // len(padded_category_sequences), 1))

I am assuming that padded_category_sequences is your target data