I am trying to make a new semantic segmentation model that will take grainy microscopy images as input and segment them.
I have all the input and ground truth images in .png format, and I'm having a hard time curating them into a dataset that others can use. I've looked into some articles, but they explain how to make label images that I already have. So, is there a way/software which I can use to curate the dataset?
Thanks
You can also use the open-source tool, FiftyOne, to curate the dataset in a way that allows you to share it and also easily visualize, explore, and analyze it along with any future model predictions.
FiftyOne has a Python API that will load your instance or semantic segmentation labels into a FiftyOne Dataset which you can then query and visualize in the App (both the raw images and the annotations).
If you store your images and segmentations on disk in this file structure:
segmentation_dataset
|
|
+--- data
| |
| +--- 000.png
| +--- 001.png
| +--- 000.png
| ...
|
+--- labels
|
+--- 000.png
+--- 001.png
+--- 000.png
...
Then you can load it into Python and visualize it with these lines of code:
import fiftyone as fo
dataset = fo.Dataset.from_dir(
"segmentation_dataset",
dataset_type=fo.types.ImageSegmentationDirectory,
name="segmentation_dataset",
force_grayscale=True,
)
# Visualize the dataset in your browser
session = fo.launch_app(dataset)
Note: Use the force_grayscale argument to load RGB masks like the one you provided.
You can add and modify samples and labels on your dataset with the FiftyOne API and then export it to disk in a variety of formats (VOC, COCO, YOLO, CVAT, etc). From there you can zip it and let others easily load it back into FiftyOne.
For example, we can use the FiftyOneDataset
format as it works for any label type:
dataset.export(
export_dir="/path/to/export_dir",
dataset_type=fo.types.FiftyOneDataset
)
Zip the dataset and send it to someone else, they can now run:
import fiftyone as fo
dataset = fo.Dataset.from_dir(
dataset_dir="/path/to/unzipped_dataset",
dataset_type=fo.types.FiftyOneDataset,
)
If you store splits of data in the folder structure shown below:
segmentation_dataset
|
|
+--- Train
| |
| +--- data
| | |
| | +--- 000.png
| | +--- 001.png
| | +--- 000.png
| | ...
| |
| +--- labels
| |
| +--- 000.png
| +--- 001.png
| +--- 000.png
| ...
+--- Test
| |
| ...
...
You can then load all of the samples into a dataset and add a tag to each one denoting which split it belongs to.
import fiftyone as fo
dataset_type = fo.types.ImageSegmentationDirectory
dataset = fo.Dataset.from_dir(
dataset_dir="segmentation_dataset/Train",
dataset_type=dataset_type,
tags="train",
name="segmentation_dataset",
)
dataset.add_dir(
dataset_dir="segmentation_dataset/Test",
dataset_type=dataset_type,
tags="test",
)
From there, you can use this dataset directly to train a model (for example with PyTorch or PyTorch Lightning Flash)