Search code examples
ralgorithm3dcoordinatesbinning

algorithm for binning of 3D coordinates (in R or any other language)


I am trying to bin 3D coordinates.

I have coordinates of a molecule moving through a protein, from over 800 simulations... What I want is to bin these data to get means, variances and how many points I have in a bin.

I imagine it like this: the space containing my 3D coordinates is split up into smaller 3D cubes (3D bins) defined by breaks().

What I need is all my x,y,z coordinates in these smaller 3D bins to calculate the mean and variance of these data.

Does this make sense?

Any help is greatly appreciated.

My input looks like this:

x<-c(1.1,1.2,4.3)
y<-c(3.4,5,2,3.2)
z<-c(10.1,10.3,12)
dat <- data.frame(x=x,y=y,z=z)

and the output should be organised by bins with dat having additional info on which bin the coordinates belong to:

x y y bin_x bin_y bin_z

Solution

  • Here you go. I might be completely wrong here, but your question is hard to answer without some expected output. I went on your intention of calculating mean and variance for each small cube, so created a grouping variable.

    #generate some data with some more points and a vale
    
    set.seed(32587)
    
    n=500
    dat <- data.frame(x=runif(n,min=0,max=10),
                      y=runif(n,min=0,max=10),
                      z=runif(n,min=0,max=10))
    
    
    #create bins (using 'cut', no need to do this manually or in a loop)
    #I have removed the labels, so each bin is just a number.
    
    #breaks have been changed to allow for actual binning 
    
    breaks<-seq(0,10,1)
    
    dat$bin_x <- cut(dat$x, breaks=breaks, labels=F)
    dat$bin_y <- cut(dat$y, breaks=breaks, labels=F)
    dat$bin_z <- cut(dat$z, breaks=breaks, labels=F)
    
    #create grouping variable with some string formatting for readability
    dat$bin_all <- with(dat, sprintf("%02d.%02d.%02d",bin_x,bin_y,bin_z))
    
    head(dat)
    
    
    library(data.table)
    
    m_dat <- melt(setDT(dat),measure.vars=c("x","y","z"))
    
    
    res <- m_dat[,.(mean_value=mean(value),variance_value=var(value),
                    n_value=.N),by=list(bin_all,variable)]
    res