Search code examples
matlabaxesscatter

select limits for region with most values in scatter plot


Say if I have a scatter plot:

dat = [1+(5-1).*rand(1000,1);89;92];
dat2 = dat+0.2;
scatter(dat,dat2);

As you can see from the graph there are two points that are much larger than the remainder of the values. Is there a method for obtaining the axis limits for the region where the majority of the values lie?


Solution

  • That depends on your definition of "majority", but for tasks like this you should usually employ statistical tools, such as mean and std.

    Let's assume that the majority of points lie within one standard deviation from the mean value. According to this logic, you need to find all the points that fall within that range in the x-axis and in the y-axis:

    xmaj = dat(abs(dat - mean(dat)) < std(dat));
    ymaj = dat2(abs(dat2 - mean(dat2)) < std(dat2));
    

    Now xmaj and ymaj contain the coordinates of the "majority" of points. To get the axis limits of the region that contains the majority of points, just do:

    xlims = [min(xmaj), max(xmaj)]
    ylims = [min(ymaj), max(ymaj)]
    

    For your example, you should something like this:

    xlims =
    
        1.0053    4.9969
    
    
    ylims =
    
        1.2053    5.1969