I am going to train a model on the set of images whose labels are in a csv file. So I used flow_from_dataframe from tf.keras
and specified the parameters, but when it comes to class_mode
it says errors and says Found 3662 validated image filenames belonging to 1 classes.
- for both sparse and categorical. This is multi class classification."
"Initially labels were int so i converted it to strings then I got this output."
df_train=pd.read_csv(r"../input/train.csv",delimiter=',')
df_test=pd.read_csv(r"../input/test.csv",delimiter=',')
print(df_train.head())
print(df_test.head())
df_train['id_code']=df_train['id_code']+'.png'
df_train['diagnosis']=str(df_train['diagnosis'])
df_test['id_code']=df_test['id_code']+'.png'
""" output is
id_code diagnosis
0 000c1434d8d7 2
1 001639a390f0 4
2 0024cdab0c1e 1
3 002c21358ce6 0
4 005b95c28852 0
id_code
0 0005cfc8afb6
1 003f0afdcd15
2 006efc72b638
3 00836aaacf06
4 009245722fa4
"""
train_datagen = ImageDataGenerator(
rescale = 1./255,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
TRAINING_DIR='../input/train_images'
train_generator= train_datagen.flow_from_dataframe(
dataframe=df_train,
directory=TRAINING_DIR,
x_col='id_code',
y_col='diagnosis',
batch_size=20,
target_size=(1050,1050),
class_mode='categorical'#used also sparsed
)
""" output is
Found 3662 validated image filenames belonging to 1 classes.
"""
“I expect the output of "Found 3662 validated image filenames belonging to 5 classes"
, but the actual output is "Found 3662 validated image filenames belonging to 1 classes"
”
"sparse" class mode requires integer value and "categorical" requires one hot encoded vector of your class columns. So I would try:
df['diagnosis'] = df['diagnosis'].astype(str)
and then use "sparse" class mode.
train_generator= train_datagen.flow_from_dataframe(
dataframe=df_train,
directory=TRAINING_DIR,
x_col='id_code',
y_col='diagnosis',
batch_size=20,
target_size=(1050,1050),
class_mode='sparse'
)
Or alternatively you can use one hot encoding like this:
pd.get_dummies(df,prefix=['diagnosis'], drop_first=True)
And then use "categorical" class_mode:
train_generator= train_datagen.flow_from_dataframe(
dataframe=df_train,
directory=TRAINING_DIR,
x_col='id_code',
y_col=df.columns[1:],
batch_size=20,
target_size=(1050,1050),
class_mode='categorical'
)