python deep-learning pytorch semantic-segmentation detectron

Incorrect positions of annotation polygons when drawing with Detectron2 Visualizer

Hello Stack Overflow community,

I apologize if my question seems trivial. I am currently working on building detection from aerial PNG images. Each image has dimensions of 2000 pixels by 2000 pixels with a resolution of 20cm. To accomplish this, I am using Detectron2 from Facebook AI Research.

Detectron2 has a function called get_balloon_dicts that registers the dataset annotations and a visualizer module to visualize the annotations. In a tutorial IPython Notebook file provided by Detectron2 (accessible at https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5), the process is demonstrated successfully for the balloon example.

I have followed the Detectron2 tutorial for the balloon example, and it worked well in my Anaconda environment. The annotation JSON file was visualized correctly on the balloon images, as shown in the attached screenshot.

However, when I tried to apply the same process to my building images and annotation JSON file, the images were displayed correctly, but the annotations could not be visualized. Instead, some labels appeared at the top of the image, as shown in the attached screenshot.

I expected to achieve the same result as the balloon example since the building annotation JSON file is created in a similar format, structure, and attributes. I used the following code, which is a copy version of the balloon code from the Detectron2 tutorial, to register the dataset and visualize the annotations:

The code for registering dataset:

from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode
import cv2

def get_building_dicts(img_dir):
    json_file = os.path.join(img_dir, "via_region_data.json")
    with open(json_file) as f:
        imgs_anns = json.load(f)

    dataset_dicts = []
    for idx, v in enumerate(imgs_anns.values()):
        record = {}
        
        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]

        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = height
        record["width"] = width
      
        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            assert not anno["region_attributes"]
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

for d in ["train", "val"]:
    DatasetCatalog.register("building_" + d, lambda d=d: get_building_dicts("wisconsin_dataset2020/" + d))
    MetadataCatalog.get("building_" + d).set(thing_classes=["building"])
building_metadata = MetadataCatalog.get("building_train")

The code for visualizing annotations and images:

dataset_dicts = get_building_dicts("wisconsin_dataset2020/train")
for d in random.sample(dataset_dicts, 1):
    img = cv2.imread(d["file_name"])
    print(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=building_metadata, scale=1.0)
    out = visualizer.draw_dataset_dict(d)
    #cv_imshow(out.get_image()[:, :, ::-1])
    plt.figure(figsize=(20, 20))
    plt.imshow(out.get_image()[:, :, ::-1])
    plt.show()

I have attached samples of the images and the annotation JSON file for troubleshooting purposes on this link https://github.com/facebookresearch/detectron2/files/12052249/building_dataset.zip.

Any potential solutions or improvements to the code that could help me achieve the desired result, as shown in the provided screenshot, would be greatly appreciated.

Thank you in advance for your assistance.

I expect to see an image like this:

Environment:

sys.platform win32 Python 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 15:53:35) [MSC v.1929 64 bit (AMD64)] numpy 1.24.3 detectron2 0.6 DETECTRON2_ENV_MODULE PyTorch 2.0.1 @L:\projects\pythonlover\conda_projects\envs\detectron2gpu\lib\site-packages\torch PyTorch debug build False torch._C._GLIBCXX_USE_CXX11_ABI False GPU available Yes GPU 0 Quadro RTX 5000 (arch=7.5) Driver version 522.06 CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 Pillow 9.4.0 torchvision 0.15.2 @L:\projects\pythonlover\conda_projects\envs\detectron2gpu\lib\site-packages\torchvision torchvision arch flags L:\projects\pythonlover\conda_projects\envs\detectron2gpu\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.7.0

PyTorch built with:

C++ Version: 199711
MSVC 193431937
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 2019
LAPACK is enabled (usually provided by MKL)
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
CuDNN 8.7
Magma 2.5.4
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

Solution

I discovered the solution after analyzing the building’s JSON file. I noticed that some coordinates, especially in the y-axis, were negative, which was impacting the visualization of the JSON on the images. It wasn’t easy to identify this issue due to the file’s large size. Nonetheless, changing the negative values to positive ones resolved the problem.