Search code examples
3dgnuplotdensity-plot

How to make 3D density plot based on the class of the point


I am trying to make an XYZ plot of my data points. Each data has a value associated '1' error or '0' success. My data is in this link. As a first attempt, I have used splot

splot "data_all.dat" u 1:2:3:4 w points ls 1 palette title "P_{error}"

enter image description here

The problem with this plot is that it is not possible to distinguish well the relative position between points and their location in space. To solve it, a precedent question How to plot (x,y,z) points showing their density provides a solution which degrades the color based on the density of points.

I would like to extend that question by including the class of each point (error or success) in the criteria to color them. That is, take into account both types of points to make the coloring, and plot all the classes points.

I have not precise way to make the coloring but an idea could be of using a function like (1 - a) x (num_success_in_delta) + (a x num_errors_in_delta), where a is a real number [0,1] which weights the number of error and success points in a ball delta. Interpolation of XYZ points between error and success samples could be another way but I do not know how that it can be addressed in Gnuplot.

To improve the information of the density of points, if possible a projection of isolines or 2D density plot in the XY plane might be clarifying. I am looking to make an eps file to include as a figure in LaTeX with the quality that provides Gnuplot or pgfplots.

Regards


Solution

  • The code below is a slight modification of the solution here (How to plot (x,y,z) points showing their density). The occurrences for errors and success are counted in a certain volume (2*DeltaX x 2*DeltaY x 2*DeltaZ). The results are stored in a file, so, you have to do the counting only once (your 10'000 lines data took about 1h15min on my old PC). Maybe, the gnuplot code can be made more efficient. Well, then you take the second code below and quickly plot the resulting file. I am not sure which way of coloring is best. You can play with the color palette. Just as an example, the code below uses red (-1) for the maximum error count (i.e. density) and green (+1) for the maximum success density. I Hope this somehow helpful as a starting point for further optimization.

    ### 3D density plot
    reset session
    
    FILE = "data_all.dat"
    
    DeltaX = 0.5  # half boxwidth
    DeltaY = 0.5  # half boxlength
    DeltaZ = 0.5  # half boxheight
    
    TimeStart = time(0.0)
    
    # put the datafile/dataset into arrays
    stats FILE nooutput
    RowCount = STATS_records
    array ColX[RowCount]
    array ColY[RowCount]
    array ColZ[RowCount]
    array ColR[RowCount]   # Result 0=Success, 1=Error
    array ColCE[RowCount]  # Counts Error
    array ColCS[RowCount]  # Counts Success
    do for [i=1:RowCount] {
    set table $Dummy
        plot FILE u (ColX[$0+1]=$1,0):(ColY[$0+1]=$2,0):(ColZ[$0+1]=$3,0):(ColR[$0+1]=$4,0) with table
    unset table
    }
    
    # look at each datapoint and its sourrounding
    Error = 1
    Success = 0
    do for [i=1:RowCount] {
        print sprintf("Datapoint %g of %g",i,RowCount)
        x0 = ColX[i]
        y0 = ColY[i]
        z0 = ColZ[i]
        # count the datapoints with distances <Delta around the datapoint of interest
        set table $ErrorOccurrences
            plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Error)? 1 : 0):(1) smooth frequency
        unset table
        set table $SuccessOccurrences
            plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Success) ? 1 : 0):(1) smooth frequency
        unset table
        # extract the number from $Occurrences which will be used to color the datapoint
        set table $ErrorDummy
            plot $ErrorOccurrences u (c0=$2,0):($0) every ::1::1 with table
        unset table
        ColCE[i] = c0
        set table $SuccessDummy
            plot $SuccessOccurrences u (c0=$2,0):($0) every ::1::1 with table
        unset table
        ColCS[i] = c0
    }
    
    # put the arrays into a dataset again
    set print $Data
    do for [i=1:RowCount] {
        print sprintf("%g\t%g\t%g\t%g\t%g\t%g",ColX[i],ColY[i],ColZ[i],ColR[i],ColCE[i],ColCS[i])
    }
    set print
    
    stats $Data u 5:6 nooutput
    CEmax = STATS_max_x
    CSmax = STATS_max_y
    print CEmax, CSmax
    
    TimeEnd = time(0.0)
    print sprintf("Duration: %.3f sec",TimeEnd-TimeStart)
    
    set print "data_all_color.dat"
        print $Data
    set print
    
    set palette defined (-1 "red", 0 "white", 1 "green")
    splot $Data u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.5 pt 7 lc palette z notitle
    ### end of code
    

    Once you counted the occurrences, simply plot the new datafile and play with the color palette.

    ### 3D density plot
    reset session
    
    FILE = "data_all_color.dat"
    
    stats FILE u 5:6 nooutput  # get maxium count from Error and Success
    CEmax = STATS_max_x
    CSmax = STATS_max_y
    print CEmax, CSmax
    
    set ztics 0.2
    set view 50,70
    set palette defined (-1 "red", 0 "white", 1 "green")
    
    splot FILE u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.2 pt 7 lc palette z notitle
    ### end of code
    

    For example with your data:

    enter image description here

    Addition: Columns $5 and $6 currently contain the absolute number of occurrences of Error and Success in a certain volume, respectively. If you want an error probability (but I am not a statistician), but my guess would be that you have to divide the occurences of Error $5 by the total number of events $5+$6 in this volume.

    splot FILE u 1:2:3:($5/($5+$6)) w p ps 0.2 pt 7 lc palette z notitle

    The palette for the other example was set palette rgb 33,13,10 In general, for the palette, consult help palette and you'll find a lot of details.