Search code examples
pythontensorflowkeras

Unable to train from flow_from_dataframe Got unexpected no. of classes


I am going to train a model on the set of images whose labels are in a csv file. So I used flow_from_dataframe from tf.keras and specified the parameters, but when it comes to class_mode it says errors and says Found 3662 validated image filenames belonging to 1 classes. - for both sparse and categorical. This is multi class classification."

"Initially labels were int so i converted it to strings then I got this output."

df_train=pd.read_csv(r"../input/train.csv",delimiter=',')
df_test=pd.read_csv(r"../input/test.csv",delimiter=',')
print(df_train.head())
print(df_test.head())
df_train['id_code']=df_train['id_code']+'.png'
df_train['diagnosis']=str(df_train['diagnosis'])
df_test['id_code']=df_test['id_code']+'.png'

""" output is
        id_code  diagnosis
0  000c1434d8d7          2
1  001639a390f0          4
2  0024cdab0c1e          1
3  002c21358ce6          0
4  005b95c28852          0
        id_code
0  0005cfc8afb6
1  003f0afdcd15
2  006efc72b638
3  00836aaacf06
4  009245722fa4
"""

train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

TRAINING_DIR='../input/train_images'

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col='diagnosis',
    batch_size=20,
    target_size=(1050,1050),
    class_mode='categorical'#used also sparsed
)

""" output is
Found 3662 validated image filenames belonging to 1 classes.
"""

“I expect the output of "Found 3662 validated image filenames belonging to 5 classes" , but the actual output is "Found 3662 validated image filenames belonging to 1 classes"


Solution

  • "sparse" class mode requires integer value and "categorical" requires one hot encoded vector of your class columns. So I would try:

    df['diagnosis'] = df['diagnosis'].astype(str)
    

    and then use "sparse" class mode.

    train_generator= train_datagen.flow_from_dataframe(
        dataframe=df_train,
        directory=TRAINING_DIR,
        x_col='id_code',
        y_col='diagnosis',
        batch_size=20,
        target_size=(1050,1050),
        class_mode='sparse'
    )
    

    Or alternatively you can use one hot encoding like this:

    pd.get_dummies(df,prefix=['diagnosis'], drop_first=True)
    

    And then use "categorical" class_mode:

    train_generator= train_datagen.flow_from_dataframe(
        dataframe=df_train,
        directory=TRAINING_DIR,
        x_col='id_code',
        y_col=df.columns[1:],
        batch_size=20,
        target_size=(1050,1050),
        class_mode='categorical'
    )