How to make 3D density plot based on the class of the point

I am trying to make an XYZ plot of my data points. Each data has a value associated '1' error or '0' success. My data is in this link. As a first attempt, I have used splot

splot "data_all.dat" u 1:2:3:4 w points ls 1 palette title "P_{error}"

The problem with this plot is that it is not possible to distinguish well the relative position between points and their location in space. To solve it, a precedent question How to plot (x,y,z) points showing their density provides a solution which degrades the color based on the density of points.

I would like to extend that question by including the class of each point (error or success) in the criteria to color them. That is, take into account both types of points to make the coloring, and plot all the classes points.

I have not precise way to make the coloring but an idea could be of using a function like (1 - a) x (num_success_in_delta) + (a x num_errors_in_delta), where a is a real number [0,1] which weights the number of error and success points in a ball delta. Interpolation of XYZ points between error and success samples could be another way but I do not know how that it can be addressed in Gnuplot.

To improve the information of the density of points, if possible a projection of isolines or 2D density plot in the XY plane might be clarifying. I am looking to make an eps file to include as a figure in LaTeX with the quality that provides Gnuplot or pgfplots.

Regards

Solution

The code below is a slight modification of the solution here (How to plot (x,y,z) points showing their density). The occurrences for errors and success are counted in a certain volume (2*DeltaX x 2*DeltaY x 2*DeltaZ). The results are stored in a file, so, you have to do the counting only once (your 10'000 lines data took about 1h15min on my old PC). Maybe, the gnuplot code can be made more efficient. Well, then you take the second code below and quickly plot the resulting file. I am not sure which way of coloring is best. You can play with the color palette. Just as an example, the code below uses red (-1) for the maximum error count (i.e. density) and green (+1) for the maximum success density. I Hope this somehow helpful as a starting point for further optimization.

### 3D density plot
reset session

FILE = "data_all.dat"

DeltaX = 0.5  # half boxwidth
DeltaY = 0.5  # half boxlength
DeltaZ = 0.5  # half boxheight

TimeStart = time(0.0)

# put the datafile/dataset into arrays
stats FILE nooutput
RowCount = STATS_records
array ColX[RowCount]
array ColY[RowCount]
array ColZ[RowCount]
array ColR[RowCount]   # Result 0=Success, 1=Error
array ColCE[RowCount]  # Counts Error
array ColCS[RowCount]  # Counts Success
do for [i=1:RowCount] {
set table $Dummy
    plot FILE u (ColX[$0+1]=$1,0):(ColY[$0+1]=$2,0):(ColZ[$0+1]=$3,0):(ColR[$0+1]=$4,0) with table
unset table
}

# look at each datapoint and its sourrounding
Error = 1
Success = 0
do for [i=1:RowCount] {
    print sprintf("Datapoint %g of %g",i,RowCount)
    x0 = ColX[i]
    y0 = ColY[i]
    z0 = ColZ[i]
    # count the datapoints with distances <Delta around the datapoint of interest
    set table $ErrorOccurrences
        plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Error)? 1 : 0):(1) smooth frequency
    unset table
    set table $SuccessOccurrences
        plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Success) ? 1 : 0):(1) smooth frequency
    unset table
    # extract the number from $Occurrences which will be used to color the datapoint
    set table $ErrorDummy
        plot $ErrorOccurrences u (c0=$2,0):($0) every ::1::1 with table
    unset table
    ColCE[i] = c0
    set table $SuccessDummy
        plot $SuccessOccurrences u (c0=$2,0):($0) every ::1::1 with table
    unset table
    ColCS[i] = c0
}

# put the arrays into a dataset again
set print $Data
do for [i=1:RowCount] {
    print sprintf("%g\t%g\t%g\t%g\t%g\t%g",ColX[i],ColY[i],ColZ[i],ColR[i],ColCE[i],ColCS[i])
}
set print

stats $Data u 5:6 nooutput
CEmax = STATS_max_x
CSmax = STATS_max_y
print CEmax, CSmax

TimeEnd = time(0.0)
print sprintf("Duration: %.3f sec",TimeEnd-TimeStart)

set print "data_all_color.dat"
    print $Data
set print

set palette defined (-1 "red", 0 "white", 1 "green")
splot $Data u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.5 pt 7 lc palette z notitle
### end of code

Once you counted the occurrences, simply plot the new datafile and play with the color palette.

### 3D density plot
reset session

FILE = "data_all_color.dat"

stats FILE u 5:6 nooutput  # get maxium count from Error and Success
CEmax = STATS_max_x
CSmax = STATS_max_y
print CEmax, CSmax

set ztics 0.2
set view 50,70
set palette defined (-1 "red", 0 "white", 1 "green")

splot FILE u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.2 pt 7 lc palette z notitle
### end of code

For example with your data:

Addition: Columns $5 and $6 currently contain the absolute number of occurrences of Error and Success in a certain volume, respectively. If you want an error probability (but I am not a statistician), but my guess would be that you have to divide the occurences of Error $5 by the total number of events $5+$6 in this volume.

splot FILE u 1:2:3:($5/($5+$6)) w p ps 0.2 pt 7 lc palette z notitle

The palette for the other example was set palette rgb 33,13,10 In general, for the palette, consult help palette and you'll find a lot of details.