Search code examples
c++opencvdotnetnuke

How to read data at the specific coordinates in high-dimensional Mat:class using c++?


I am trying to use use the MobileNet SSD + deep neural network ( dnn ) module in OpenCV for object detection. I loaded and used the model successfully. As the output of net.forward I obtain Mat object containing the information about the detected objects. Unfortunately, I struggle with "the easy part of work", with reading what exactly was detected.

Here is information I know about the output Mat object:

  • It has 4 dimensions.
  • The size is 1 x 1 x number_of_objects_detected x 7.
  • The seven pieces of information about each object were: the 1st is the class ID, the 2nd is the confidence, the 3rd-7th are the bounding box values.

I can't find any c++ example, but I found many python examples. They read the data like this:

for i in np.arange(0, detections.shape[2]):    
    confidence = detections[0, 0, i, 2]

What is the easiest way how to do this in c++? I.e. I need to read the data at the specific coordinates in high-dimensional Mat:class.

Thank you for your kind help. I am quite new in c++ and sometimes found it overwhelming...

I am using OpenCV 3.3.0. The GitHub with the MobileNet SSD I am using: https://github.com/chuanqi305/MobileNet-SSD.

The code of my program:

#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

#include <fstream>
#include <iostream>

using namespace cv;
using namespace cv::dnn;

using namespace std;

// function to create vector of class names
std::vector<String> createClaseNames() {
    std::vector<String> classNames;
    classNames.push_back("background");
    classNames.push_back("aeroplane");
    classNames.push_back("bicycle");
    classNames.push_back("bird");
    classNames.push_back("boat");
    classNames.push_back("bottle");
    classNames.push_back("bus");
    classNames.push_back("car");
    classNames.push_back("cat");
    classNames.push_back("chair");
    classNames.push_back("cow");
    classNames.push_back("diningtable");
    classNames.push_back("dog");
    classNames.push_back("horse");
    classNames.push_back("motorbike");
    classNames.push_back("person");
    classNames.push_back("pottedplant");
    classNames.push_back("sheep");
    classNames.push_back("sofa");
    classNames.push_back("train");
    classNames.push_back("tvmonitor");
    return classNames;
}

// main function
int main(int argc, char **argv)
{
    // set inputs
    String modelTxt = "C:/Users/acer/Desktop/kurz_OCV/cv4faces/project/python/object-detection-deep-learning/MobileNetSSD_deploy.prototxt";
    String modelBin = "C:/Users/acer/Desktop/kurz_OCV/cv4faces/project/python/object-detection-deep-learning/MobileNetSSD_deploy.caffemodel";
    String imageFile = "C:/Users/acer/Desktop/kurz_OCV/cv4faces/project/puppies.jpg";
    std::vector<String> classNames = createClaseNames();

    //read caffe model
    Net net;
    try {
        net = dnn::readNetFromCaffe(modelTxt, modelBin);
    }
    catch (cv::Exception& e) {
        std::cerr << "Exception: " << e.what() << std::endl;
        if (net.empty())
        {
            std::cerr << "Can't load network." << std::endl;
            exit(-1);
        }
    }

    // read image 
    Mat img = imread(imageFile);

    // create input blob
    resize(img, img, Size(300, 300));
    Mat inputBlob = blobFromImage(img, 0.007843, Size(300, 300), Scalar(127.5)); //Convert Mat to dnn::Blob image batch

    // apply the blob on the input layer
    net.setInput(inputBlob); //set the network input

    // classify the image by applying the blob on the net
    Mat detections = net.forward("detection_out"); //compute output

    // print some information about detections
    std::cout << "dims: " << detections.dims << endl;
    std::cout << "size: " << detections.size << endl;

    //show image
    String winName("image");
    imshow(winName, img);

    // Wait for keypress
    waitKey();

}

Solution

  • Check out the official OpenCV tutorial on how to scan images.

    The normal way you'd access a 3-channel (i.e. color) Mat way would be using the Mat::at() method of the Mat class, which is heavily overloaded for all sorts of accessor options. Specifically, you can send in an array of indices or a vector of indices.


    Here's a most basic example creating a 4D Mat and accessing a specific element:

    #include <opencv2/opencv.hpp>
    #include <iostream>
    
    int main() {
        int size[4] = { 2, 2, 5, 7 };
        cv::Mat M(4, size, CV_32FC1, cv::Scalar(1));
        int indx[4] = { 0, 0, 2, 3 };
        std::cout << "M[0, 0, 2, 3] = " << M.at<float>(indx) << std::endl;
    }
    
    M[0, 0, 2, 3] = 1