Unable to train from flow_from_dataframe Got unexpected no. of classes

I am going to train a model on the set of images whose labels are in a csv file. So I used flow_from_dataframe from tf.keras and specified the parameters, but when it comes to class_mode it says errors and says Found 3662 validated image filenames belonging to 1 classes. - for both sparse and categorical. This is multi class classification."

"Initially labels were int so i converted it to strings then I got this output."

df_train=pd.read_csv(r"../input/train.csv",delimiter=',')
df_test=pd.read_csv(r"../input/test.csv",delimiter=',')
print(df_train.head())
print(df_test.head())
df_train['id_code']=df_train['id_code']+'.png'
df_train['diagnosis']=str(df_train['diagnosis'])
df_test['id_code']=df_test['id_code']+'.png'

""" output is
        id_code  diagnosis
0  000c1434d8d7          2
1  001639a390f0          4
2  0024cdab0c1e          1
3  002c21358ce6          0
4  005b95c28852          0
        id_code
0  0005cfc8afb6
1  003f0afdcd15
2  006efc72b638
3  00836aaacf06
4  009245722fa4
"""

train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

TRAINING_DIR='../input/train_images'

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col='diagnosis',
    batch_size=20,
    target_size=(1050,1050),
    class_mode='categorical'#used also sparsed
)

""" output is
Found 3662 validated image filenames belonging to 1 classes.
"""

“I expect the output of "Found 3662 validated image filenames belonging to 5 classes" , but the actual output is "Found 3662 validated image filenames belonging to 1 classes"

”

Solution

"sparse" class mode requires integer value and "categorical" requires one hot encoded vector of your class columns. So I would try:

df['diagnosis'] = df['diagnosis'].astype(str)

and then use "sparse" class mode.

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col='diagnosis',
    batch_size=20,
    target_size=(1050,1050),
    class_mode='sparse'
)

Or alternatively you can use one hot encoding like this:

pd.get_dummies(df,prefix=['diagnosis'], drop_first=True)

And then use "categorical" class_mode:

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col=df.columns[1:],
    batch_size=20,
    target_size=(1050,1050),
    class_mode='categorical'
)