Search code examples
matlabcontourcovarianceellipsedata-extraction

Looking for a tool that extracts data from a plot figure ( here 2D contours from Covariance matrix or Markov chains) and reproduce the original figure


I am looking for an application or a tool which is able for example to extract data from a 2D contour plot like below :

data to extract

I have seen https://dash-gallery.plotly.host/Portal/ tool or https://plotly.com/dash/ , https://automeris.io/ , but I have test them and this is difficult to extract data (here actually, the data are covariance matrices with ellipses, but I would like to extend it if possible to Markov chains).

If someone could know if there are more efficient tools, mostly from this kind of 2D plot. I am also opened to commercial applications. I am on MacOS 11.3.

If I am not on the right forum, please let me know it.

UPDATE 1:

I tried to apply the method in Matlab with the script below from this previous post :

%// Import the data:
imdata = importdata('Omega_L_Omega_m.png');
Gray = rgb2gray(imdata.cdata);
colorLim = [-1 1]; %// this should be set manually
%// Get the area of the data:
f = figure('Position',get(0,'ScreenSize'));
imshow(imdata.cdata,'Parent',axes('Parent',f),'InitialMagnification','fit');
%// Get the area of the data:
title('Click with the cross on the most top left area of the *data*')
da_tp_lft = round(getPosition(impoint));
title('Click with the cross on the most bottom right area of the *data*') 
da_btm_rgt = round(getPosition(impoint));
dat_area = double(Gray(da_tp_lft(2):da_btm_rgt(2),da_tp_lft(1):da_btm_rgt(1)));
%// Get the area of the colorbar:
title('Click with the cross within the upper most color of the *colorbar*')
ca_tp_lft = round(getPosition(impoint));
title('Click with the cross within the bottom most color of the *colorbar*')
ca_btm_rgt = round(getPosition(impoint));
cmap_area = double(Gray(ca_tp_lft(2):ca_btm_rgt(2),ca_tp_lft(1):ca_btm_rgt(1)));
close(f)
%// Convert the colormap to data:
data = dat_area./max(cmap_area(:)).*range(colorLim)-abs(min(colorLim));

It seems that I get data in data array but I don't know how to exploit it to reproduce the original figure from these data.

Could anyone see how to plot with Matlab this kind of plot with the data I have normally extracted (not sure the Matlab. script has generated all the data for green, orange and blue contours, with each confidence level, that is to say, 68%, 95%, 99.7%) ?

UPDATE 2: I have had first elements of answer on the following link :

partial answer but not fully completed

I cite elements of the approach :

clc
clear all;
imdata = imread('https://www.mathworks.com/matlabcentral/answers/uploaded_files/642495/image.png');
close all;
Gray = rgb2gray(imdata);
yax=sum(conv2(single(Gray),[-1 -1 -1;0 0 0; 1 1 1],'valid'),2);
xax=sum(conv2(single(Gray),[-1 -1 -1;0 0 0; 1 1 1]','valid'),1);
figure(1),subplot(211),plot(xax),subplot(212),plot(yax)

rgb2gray

ROIy = find(abs(yax)>1e5);
    ROIyinner = find(diff(ROIy)>5);
    ROIybounds = ROIy([ROIyinner ROIyinner+1]);
ROIx = find(abs(xax)>1e5);
    ROIxinner = find(diff(ROIx)>5);
    ROIxbounds = ROIx([ROIxinner ROIxinner+1]);
PLTregion = Gray(ROIybounds(1):ROIybounds(2),ROIxbounds(1):ROIxbounds(2));
PLTregion(PLTregion==255)=nan;
figure(2),imagesc(PLTregion)

different regions

[N X]=hist(single(PLTregion(:)),0:255);
figure(3),plot(X,N),set(gca,'yscale','log')

histogram

PLTitems = find(N>2000)% %limit "color" of interest to items with >1000 pixels
PLTitems = 1×10

1 67 90 101 129 132 144 167 180 194

PLTvalues = X(PLTitems);
PLTvalues(1)=[]; %ignore black?
%test out region 1
for ind = 1:numel(PLTvalues)
    temp = zeros(size(PLTregion)); 
    temp(PLTregion==PLTvalues(ind) | (PLTregion<=50 & PLTregion>10))=255; 
%     figure(100), imagesc(temp)
    temp = bwareaopen(temp,1000);
    temp = imfill(temp,'holes');
    figure(100), subplot(3,3,ind),imagesc(temp)
    figure(101), subplot(3,3,ind),imagesc(single(PLTregion).*temp,[0 255])
end

recognition multiple area

If someone could know how to improve these first interesting results, this would be fine to mention it.


Solution

  • Restating the problem - My understanding given the different comments and your updates is the following:

    • someone other than you is in possession of data, which as it happens is 2D data, i.e. an Nx2 matrix;
    • using the covariance matrix, they are effectively saying something about the joint distribution of these two dimensions, specifically about the variance;
    • if they assume a Gaussian distribution, as is implied by your comment regarding 68%, 95% and 99.7% for 1sigma, 2sigma and 3sigma, they can draw ellipses which represent the 2D-normal distribution: these are in fact some of the contour lines associated with the 3D "bell" surface;
    • you have obtained the contour lines in a graph and are trying to obtain the covariance matrix (not the original data...);
    • you are concerned about the complexity of having to extract the information from each ellipsis.

    Partial answer:

    • It is impossible to recover the original data, I hope you are already aware of that, but in case you are not let's just note that the covariance matrix is a summary statistic of the data, much like the average, and although it says something about the data many different datasets could happen to have the same summary statistic (the same way many different sets of numbers can give you an average of 10).
    • It is possible to somewhat recover the covariance matrix, i.e. the 3 numbers a, b and c in the matrix [a,b;b,c], though the error in doing so will likely be large because of how imprecise the pixel representation is. Essentially, you will be looking for the dimensions of the two axes, for the variances, as well as the angle of one of the axes, for the covariance.
    • Unless I am mistaken, under the Gaussian assumption above, you only need to measure this for one of the three ellipses, and then factor by whatever number of sigmas that contour represents. Here you might want to either use the best-defined ellipse, or attempt to use the largest one, which will provide the maximum precision for your measurements (cf. pixelization).
    • Also, the problem of finding the axes and angle for the ellipse need not be as complex as what it seems like in your first trials: instead of trying to find the contour of the ellipses, find the bounding rectangle.
    • In order to further simplify this process, if your images are color-coded the way you show, then a filter on blue pixels might be enough in terms of image processing. Then simply take the minimum and maximum (x,y) coordinates in order to obtain the bounding rectangle.
    • Once the bounding rectangle is obtained, find the equation to your ellipse (that's a question for a math group, but you could start here for example).

    Happy filtering!