Search code examples
pythonimagetensorflowmachine-learningpytorch

Human segmentation fails with Pytorch, not with Tensorflow Keras


I probably missed something, but here is the same workflow with Pytorch and Tensorflow Keras.

The results are here:

The PyTorch version

With pytorch

The Keras version:

with tensorflow keras

Hard to explain the whole process but this is what I do with PyTorch:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = models.segmentation.deeplabv3_resnet50(weights="DEFAULT").to(device)

# Mettre le modèle en mode évaluation
model.eval()

# Transformations similaires à celles utilisées dans Keras
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((img_size, img_size)),
])

def t_read_image(imagepath: Image.Image):
    image = transform(imagepath)
    # make the image to be from -1 to 1
    print(image.min(), image.max())
    image = image * 2 - 1
    image = image.to(device)
    return image

def t_infer(model, image):
    with torch.no_grad():
        output = model(image.unsqueeze(0))

    output = output['out']
    output = np.squeeze(output.cpu().numpy())
    
    output = output[1:]
    output = np.argmax(output, axis=0)
    return output

def t_decode_segmentation_masks(mask, colormap, n_classes):
    r = np.zeros_like(mask).astype(np.uint8)
    g = np.zeros_like(mask).astype(np.uint8)
    b = np.zeros_like(mask).astype(np.uint8)
    for l in range(0, n_classes):
        idx = mask == l
        r[idx] = colormap[l, 0]
        g[idx] = colormap[l, 1]
        b[idx] = colormap[l, 2]
    rgb = np.stack([r, g, b], axis=2)
    return rgb

def t_get_overlay(image, colored_mask):
    image = image.cpu().numpy()
    image = (image - image.min()) / (image.max() - image.min()) * 255
    image = image.astype(np.uint8)
    image = np.transpose(image, (1, 2, 0))
    overlay = cv2.addWeighted(image, 0.35, colored_mask, 0.65, 0)
    return overlay

def t_segmentation(input_image: Image.Image):
    image_tensor = t_read_image(input_image)
    prediction_mask = t_infer(image=image_tensor, model=model)
    prediction_colormap = t_decode_segmentation_masks(prediction_mask, colormap, 20)
    overlay = t_get_overlay(image_tensor, prediction_colormap)
    return (overlay, prediction_colormap)


img = Image.open('./image.jpg')
img = np.array(img)
overlay, segs = t_segmentation(img)
plt.imshow(overlay)
plt.show()

And the same thing with Keras

model = from_pretrained_keras("keras-io/deeplabv3p-resnet50")

def read_image(image):
    image = tf.convert_to_tensor(image)
    image.set_shape([None, None, 3])
    image = tf.image.resize(images=image, size=[img_size, img_size])
    image = image / 127.5 - 1
    return image

def infer(model, image_tensor):
    predictions = model.predict(np.expand_dims((image_tensor), axis=0))
    predictions = np.squeeze(predictions)
    predictions = np.argmax(predictions, axis=2)
    return predictions

def decode_segmentation_masks(mask, colormap, n_classes):
    r = np.zeros_like(mask).astype(np.uint8)
    g = np.zeros_like(mask).astype(np.uint8)
    b = np.zeros_like(mask).astype(np.uint8)
    for l in range(0, n_classes):
        idx = mask == l
        r[idx] = colormap[l, 0]
        g[idx] = colormap[l, 1]
        b[idx] = colormap[l, 2]
    rgb = np.stack([r, g, b], axis=2)
    return rgb

def get_overlay(image, colored_mask):
    image = tf.keras.preprocessing.image.array_to_img(image)
    image = np.array(image).astype(np.uint8)
    overlay = cv2.addWeighted(image, 0.35, colored_mask, 0.65, 0)
    return overlay

def segmentation(input_image):
    image_tensor = read_image(input_image)
    prediction_mask = infer(image_tensor=image_tensor, model=model)
    prediction_colormap = decode_segmentation_masks(prediction_mask, colormap, 20)
    overlay = get_overlay(image_tensor, prediction_colormap)
    return (overlay, prediction_colormap)

img = Image.open('./image.jpg')
img = np.array(img)
overlay, segs = segmentation(img)
plt.imshow(overlay)
plt.show()

As you can see, this is the same model, and globally the same code. But the result is very different. What I would like is to get the same result than with Keras, using Pytorch.

I share the notebook here: https://colab.research.google.com/drive/1mgWSRs4Z7lqag8vxBnq5vi65BohXNdMd?usp=sharing

Thanks a lot.

EDIT, using normalization from imagenet, the result is better but not the excpected one:

Pytorch With imagenet norm

It seems that the model is not exactly the same. Keras uses DeepLabV3Plus while PyTorch is using DeepLabV3.

So, is there a difference ? The model proposes 20 classes for both implementations.


Solution

  • The reason is simple, the provided model from PyTorch is not made to segment body parts but cities elements.

    The confusion came from the name of the provided model by Keras-io.

    As I need to avoid installing PyTorch and TensorFlow on the same project, but I can use onnxruntime, I only had to convert the (old Keras version) model to ONNX format.

    So, my snippet (with normalization) was OK, that was only the bad model that was used.

    PS: I share the model here https://huggingface.co/Metal3d/deeplabv3p-resnet50-human