python tensorflow machine-learning keras

Tensorflow problem with number represented in fit

I am seeing 782 instead of 25000 when fiting a model. is it ok?

Hello every one, I am new in tensorflow, trying to fit a model on IMDB dataset in tensorflow. this is how i loaded my data:

import tensorflow_datasets as tfds
import tensorflow as tf
import numpy as np
imdb ,info = tfds.load("imdb_reviews",with_info=True,as_supervised=True)
(train_data , test_data )= (imdb["train"],imdb["test"])
training_sentences = []
training_labels = []
testing_sentences = []
testing_labels = []
for s,l in train_data:
    training_sentences.append(str(s.numpy()))
    training_labels.append(l.numpy())
for s,l in test_data:
    testing_sentences.append(str(s.numpy()))
    testing_labels.append(l.numpy())
training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)
vocab_size = 10000
emmbeding_dim = 16
max_length = 120
trunc_type = "post"
oov_tok = "<OOV>"
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words= vocab_size,oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
sequence = tokenizer.texts_to_sequences(training_sentences)
paded = pad_sequences(sequence,maxlen=max_length,truncating=trunc_type)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences,maxlen=max_length, truncating=trunc_type)

and its my model using Flatten:

my_model_with_flatten = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,emmbeding_dim,input_length=max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=6,activation='leaky_relu'),
    tf.keras.layers.Dense(units=1,activation='sigmoid'),
]) 
my_model_with_flatten.compile(optimizer='adam'
                              ,loss='binary_crossentropy',metrics=["accuracy"])
flaten_history=my_model_with_flatten.fit(x=paded,y=training_labels_final,epochs=10,
                                         validation_data=[testing_padded,testing_labels_final])

Outpot:

Epoch 1/10
782/782 [==============================] - 6s 7ms/step - loss: 0.4893 - accuracy: 0.7504 - val_loss: 0.3950 - val_accuracy: 0.8175
Epoch 2/10
782/782 [==============================] - 4s 6ms/step - loss: 0.2317 - accuracy: 0.9140 - val_loss: 0.4159 - val_accuracy: 0.8166
Epoch 3/10
782/782 [==============================] - 5s 6ms/step - loss: 0.0799 - accuracy: 0.9814 - val_loss: 0.5299 - val_accuracy: 0.8070
Epoch 4/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0198 - accuracy: 0.9978 - val_loss: 0.6216 - val_accuracy: 0.8051
Epoch 5/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0063 - accuracy: 0.9995 - val_loss: 0.6861 - val_accuracy: 0.8022
Epoch 6/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 0.7462 - val_accuracy: 0.8046
Epoch 7/10
782/782 [==============================] - 4s 6ms/step - loss: 7.7573e-04 - accuracy: 1.0000 - val_loss: 0.7966 - val_accuracy: 0.8045
Epoch 8/10
782/782 [==============================] - 5s 6ms/step - loss: 4.2356e-04 - accuracy: 1.0000 - val_loss: 0.8434 - val_accuracy: 0.8060
Epoch 9/10
782/782 [==============================] - 5s 6ms/step - loss: 2.5064e-04 - accuracy: 1.0000 - val_loss: 0.8865 - val_accuracy: 0.8057
Epoch 10/10
782/782 [==============================] - 5s 7ms/step - loss: 1.4970e-04 - accuracy: 1.0000 - val_loss: 0.9293 - val_accuracy: 0.8056

I think after every eopchs i should see 25000 instead of 782 here! why i am seeing 782? is it iterating in 782 examples instead of 25000? or everything is ok and its just a number?

Solution

No, because by default you have the batch size of 32, so 25000/32=781.25, so you see 782.