I am trying to bin 3D coordinates.
I have coordinates of a molecule moving through a protein, from over 800 simulations... What I want is to bin these data to get means, variances and how many points I have in a bin.
I imagine it like this: the space containing my 3D coordinates is split up into smaller 3D cubes (3D bins) defined by breaks().
What I need is all my x,y,z coordinates in these smaller 3D bins to calculate the mean and variance of these data.
Does this make sense?
Any help is greatly appreciated.
My input looks like this:
x<-c(1.1,1.2,4.3)
y<-c(3.4,5,2,3.2)
z<-c(10.1,10.3,12)
dat <- data.frame(x=x,y=y,z=z)
and the output should be organised by bins with dat having additional info on which bin the coordinates belong to:
x y y bin_x bin_y bin_z
Here you go. I might be completely wrong here, but your question is hard to answer without some expected output. I went on your intention of calculating mean and variance for each small cube, so created a grouping variable.
#generate some data with some more points and a vale
set.seed(32587)
n=500
dat <- data.frame(x=runif(n,min=0,max=10),
y=runif(n,min=0,max=10),
z=runif(n,min=0,max=10))
#create bins (using 'cut', no need to do this manually or in a loop)
#I have removed the labels, so each bin is just a number.
#breaks have been changed to allow for actual binning
breaks<-seq(0,10,1)
dat$bin_x <- cut(dat$x, breaks=breaks, labels=F)
dat$bin_y <- cut(dat$y, breaks=breaks, labels=F)
dat$bin_z <- cut(dat$z, breaks=breaks, labels=F)
#create grouping variable with some string formatting for readability
dat$bin_all <- with(dat, sprintf("%02d.%02d.%02d",bin_x,bin_y,bin_z))
head(dat)
library(data.table)
m_dat <- melt(setDT(dat),measure.vars=c("x","y","z"))
res <- m_dat[,.(mean_value=mean(value),variance_value=var(value),
n_value=.N),by=list(bin_all,variable)]
res