tensorflow keras conv-neural-network transfer-learning

CNN to output percentage of 2 class

i'm a beginner in the argument. I have this problem: I have to classify the percentage of 2 class in each frame of a video. I created a small dataset with about 500 images (250 of each class), and a CNN with these layers:

model = tf.models.Sequential()
model.add(tf.layers.Conv2D(32, kernel_size=(3, 3), activation='relu',input_shape=(224,224,3)))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(256, kernel_size=(3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Flatten())
model.add(tf.layers.Dense(512, activation='relu'))
model.add(tf.layers.Dropout(0.2))
model.add(tf.layers.Dense(2,activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer=tf.optimizers.Adam(learning_rate=0.00001), metrics=['accuracy'])

1)It's better for the problem use binary_crossentropy + sigmoid or binary_crossentropy + softmax?

2)Then it's better to use transfer learning/fine tuning or build CNN from scratch like this?

3)I'm using ImageDataGenerator for DataAugmentation because the small dataset, it's right?

4)Which values I can use for batch_size, steps_per_epochs,learning_rate...I noticed that the model accuracy goes early to 1.0 with val_accuracy, and in the predictions doesn't return the correct percentage of each class, but return values like [9.999e-1 4.444e-5]

Solution

Since, yours is a binary classification, go with sigmoid. Softmax is for multi-class (>2).
It is always better to use transfer learning. Go with VGG16, ResNet, Inception and others.
Yes, in case of small datasets, data augmentation helps a lot.
You need to use one neuron in the last layer rather than 2. Since, in one neuron, if value is greater than 0.5, it will be considered as class 1 otherwise 0. If you want to stick with two neurons, then, for getting your answer, you should take np.argmax of the prediction, in the example you have given, pred = [9.999e-1 4.444e-5], the predicted class is 0, as pred[0] > pred[1].