I am trying to run Deeplab v3+ (the standard tensorflow version) on some remote sensing data to perform a binary classification (hedge or no hedge), but I am finding the output to be very strange, leading me to believe that there may be something going wrong with the reading of my input data.
After running the vis.py script I am getting the following as the 000000_image.png output in the segmentation_results folder. From how I understand, the image named xxxx_image is supposed to represent the original image? The pixel values here range from 0-3, in other images the values can be 0-7.
But my original images look like this (not the exact same file, but just an example of the original data so you get an idea).
In this folder is also the prediction files:
Thus I assume the prediction = the classification, and the image = original file. Any idea why I am getting this as the original file?
To build the TFRecords data I use the following script:
import math
import os.path
import sys
import build_data
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('image_folder',
'./VOCdevkit/VOC2012/JPEGImages',
'Folder containing images.')
tf.app.flags.DEFINE_string(
'semantic_segmentation_folder',
'./VOCdevkit/VOC2012/SegmentationClassRaw',
'Folder containing semantic segmentation annotations.')
tf.app.flags.DEFINE_string(
'list_folder',
'./VOCdevkit/VOC2012/ImageSets/Segmentation',
'Folder containing lists for training and validation')
tf.app.flags.DEFINE_string(
'output_dir',
'./tfrecord',
'Path to save converted SSTable of TensorFlow examples.')
_NUM_SHARDS = 4
def _convert_dataset(dataset_split):
"""Converts the specified dataset split to TFRecord format.
Args:
dataset_split: The dataset split (e.g., train, test).
Raises:
RuntimeError: If loaded image and label have different shape.
"""
dataset = os.path.basename(dataset_split)[:-4]
sys.stdout.write('Processing ' + dataset)
filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
num_images = len(filenames)
num_per_shard = int(math.ceil(num_images / float(_NUM_SHARDS)))
image_reader = build_data.ImageReader('png', channels=3)
label_reader = build_data.ImageReader('png', channels=1)
for shard_id in range(_NUM_SHARDS):
output_filename = os.path.join(
FLAGS.output_dir,
'%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
start_idx = shard_id * num_per_shard
end_idx = min((shard_id + 1) * num_per_shard, num_images)
for i in range(start_idx, end_idx):
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
i + 1, len(filenames), shard_id))
sys.stdout.flush()
# Read the image.
image_filename = os.path.join(
FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
height, width = image_reader.read_image_dims(image_data)
# Read the semantic segmentation annotation.
seg_filename = os.path.join(
FLAGS.semantic_segmentation_folder,
filenames[i] + '.' + FLAGS.label_format)
seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
seg_height, seg_width = label_reader.read_image_dims(seg_data)
if height != seg_height or width != seg_width:
raise RuntimeError('Shape mismatched between image and label.')
# Convert to tf example.
example = build_data.image_seg_to_tfexample(
image_data, filenames[i], height, width, seg_data)
tfrecord_writer.write(example.SerializeToString())
sys.stdout.write('\n')
sys.stdout.flush()
def main(unused_argv):
dataset_splits = tf.gfile.Glob(os.path.join(FLAGS.list_folder, '*.txt'))
for dataset_split in dataset_splits:
_convert_dataset(dataset_split)
if __name__ == '__main__':
tf.app.run()
In the build_data.py script I have changed one detail as my input data is a png uint16.
elif self._image_format == 'png':
self._decode = tf.image.decode_png(self._decode_data,
channels=channels, dtype=tf.uint16)
And to train I use the script which you can find at this link (its a bit large to be pasting in here I feel) https://github.com/tensorflow/models/blob/master/research/deeplab/train.py
For the visualization which then causes this output I have shown I use the script found here https://github.com/tensorflow/models/blob/master/research/deeplab/vis.py
If anyone has some insight I would greatly appreciate it.
I fixed it, turns out these models are not built for taking 16bit data as input so you need to change the image decoders to explicitly read the image as 16bit. There are a number of places you need to do this in the data generation related scripts, as well as in the model_export, otherwise your inference images later on will also be messed up.
As for the output image produced by the vis.py, in save_annotations I had to change the final image writer to use cv2 if it was writing the original image, and to use the normal method if writing the mask
if original:
cv2.imwrite('%s/%s.png' % (save_dir, filename),colored_label.astype(np.uint16))
else:
pil_image = img.fromarray(colored_label.astype(dtype=np.uint8))
with tf.gfile.Open('%s/%s.png' % (save_dir, filename), mode='w') as f:
pil_image.save(f, 'PNG')