Search code examples
pythontensorflowkerasneural-networkgoogle-colaboratory

Keras Not Modeling All Rows


Brand new to Keras, Tensorflow, and Google Colab. I'm following a YouTube tutorial. I have a df with 4000 rows that looks like this:

           x            y      color
0       1.752690    2.846610    0.0
1       0.848488    2.127556    0.0
2       2.294166    0.801233    1.0
3       4.137304    3.082904    1.0
4       2.877251    1.915737    1.0
...     ...         ...         ...
3995    4.138087    5.111319    0.0
3996    0.840928    1.691174    0.0
3997    2.820071    3.812626    0.0
3998    3.313544    4.869823    0.0
3999    3.877675    2.553817    1.0

Here is my code:

import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np

df=pd.read_csv('train.csv')
np.random.shuffle(df.values)

model = keras.Sequential([
  keras.layers.Dense(4, input_shape=(2,), activation='relu'),
  keras.layers.Dense(2, activation='sigmoid')])
model.compile(optimizer='adam',loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(df.loc[:,'x':'y'].values, df.color.values, batch_size=16)

Here is the output:

250/250 [==============================] - 0s 872us/step - loss: 0.8335 - accuracy: 0.2380
<tensorflow.python.keras.callbacks.History at 0x7fed96a646a0>

I'm concerned that the output implies it only ran 250 rows. The ouput as shown in the youtube video has a few rows and in the bottom row says 4000/4000 and then below that says [Finished in 4.5s]. You can see the output by going to 29:13 in https://www.youtube.com/watch?v=aBIGJeHRZLQ.

Is my model only running 250 rows? Why?


Solution

  • The data shown is not the rows. It is the number of batches processed. In model.fit you did specified the batch size as 16. Consequently With 4000 rows, 4000/16=250 which is the number of steps it takes in an epoch to process all 4000 rows with a batch size of 16