I am trying to make an XYZ plot of my data points. Each data has a value associated '1' error or '0' success. My data is in this link.
As a first attempt, I have used splot
splot "data_all.dat" u 1:2:3:4 w points ls 1 palette title "P_{error}"
The problem with this plot is that it is not possible to distinguish well the relative position between points and their location in space. To solve it, a precedent question How to plot (x,y,z) points showing their density provides a solution which degrades the color based on the density of points.
I would like to extend that question by including the class of each point (error or success) in the criteria to color them. That is, take into account both types of points to make the coloring, and plot all the classes points.
I have not precise way to make the coloring but an idea could be of using a function like (1 - a) x (num_success_in_delta) + (a x num_errors_in_delta)
, where a
is a real number [0,1] which weights the number of error and success points in a ball delta
. Interpolation of XYZ points between error and success samples could be another way but I do not know how that it can be addressed in Gnuplot.
To improve the information of the density of points, if possible a projection of isolines or 2D density plot in the XY plane might be clarifying.
I am looking to make an eps
file to include as a figure in LaTeX with the quality that provides Gnuplot or pgfplots.
Regards
The code below is a slight modification of the solution here (How to plot (x,y,z) points showing their density). The occurrences for errors and success are counted in a certain volume (2*DeltaX x 2*DeltaY x 2*DeltaZ)
.
The results are stored in a file, so, you have to do the counting only once (your 10'000 lines data took about 1h15min on my old PC). Maybe, the gnuplot code can be made more efficient. Well, then you take the second code below and quickly plot the resulting file. I am not sure which way of coloring is best. You can play with the color palette. Just as an example, the code below uses red (-1) for the maximum error count (i.e. density) and green (+1) for the maximum success density. I Hope this somehow helpful as a starting point for further optimization.
### 3D density plot
reset session
FILE = "data_all.dat"
DeltaX = 0.5 # half boxwidth
DeltaY = 0.5 # half boxlength
DeltaZ = 0.5 # half boxheight
TimeStart = time(0.0)
# put the datafile/dataset into arrays
stats FILE nooutput
RowCount = STATS_records
array ColX[RowCount]
array ColY[RowCount]
array ColZ[RowCount]
array ColR[RowCount] # Result 0=Success, 1=Error
array ColCE[RowCount] # Counts Error
array ColCS[RowCount] # Counts Success
do for [i=1:RowCount] {
set table $Dummy
plot FILE u (ColX[$0+1]=$1,0):(ColY[$0+1]=$2,0):(ColZ[$0+1]=$3,0):(ColR[$0+1]=$4,0) with table
unset table
}
# look at each datapoint and its sourrounding
Error = 1
Success = 0
do for [i=1:RowCount] {
print sprintf("Datapoint %g of %g",i,RowCount)
x0 = ColX[i]
y0 = ColY[i]
z0 = ColZ[i]
# count the datapoints with distances <Delta around the datapoint of interest
set table $ErrorOccurrences
plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Error)? 1 : 0):(1) smooth frequency
unset table
set table $SuccessOccurrences
plot FILE u ((abs(x0-$1)<DeltaX) & (abs(y0-$2)<DeltaY) & (abs(z0-$3)<DeltaZ) & ($4==Success) ? 1 : 0):(1) smooth frequency
unset table
# extract the number from $Occurrences which will be used to color the datapoint
set table $ErrorDummy
plot $ErrorOccurrences u (c0=$2,0):($0) every ::1::1 with table
unset table
ColCE[i] = c0
set table $SuccessDummy
plot $SuccessOccurrences u (c0=$2,0):($0) every ::1::1 with table
unset table
ColCS[i] = c0
}
# put the arrays into a dataset again
set print $Data
do for [i=1:RowCount] {
print sprintf("%g\t%g\t%g\t%g\t%g\t%g",ColX[i],ColY[i],ColZ[i],ColR[i],ColCE[i],ColCS[i])
}
set print
stats $Data u 5:6 nooutput
CEmax = STATS_max_x
CSmax = STATS_max_y
print CEmax, CSmax
TimeEnd = time(0.0)
print sprintf("Duration: %.3f sec",TimeEnd-TimeStart)
set print "data_all_color.dat"
print $Data
set print
set palette defined (-1 "red", 0 "white", 1 "green")
splot $Data u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.5 pt 7 lc palette z notitle
### end of code
Once you counted the occurrences, simply plot the new datafile and play with the color palette.
### 3D density plot
reset session
FILE = "data_all_color.dat"
stats FILE u 5:6 nooutput # get maxium count from Error and Success
CEmax = STATS_max_x
CSmax = STATS_max_y
print CEmax, CSmax
set ztics 0.2
set view 50,70
set palette defined (-1 "red", 0 "white", 1 "green")
splot FILE u 1:2:3:($4==1? -$5/CEmax : $6/CSmax) w p ps 0.2 pt 7 lc palette z notitle
### end of code
For example with your data:
Addition:
Columns $5
and $6
currently contain the absolute number of occurrences of Error and Success in a certain volume, respectively. If you want an error probability (but I am not a statistician), but my guess would be that you have to divide the occurences of Error $5
by the total number of events $5+$6
in this volume.
splot FILE u 1:2:3:($5/($5+$6)) w p ps 0.2 pt 7 lc palette z notitle
The palette for the other example was
set palette rgb 33,13,10
In general, for the palette, consult help palette
and you'll find a lot of details.