Search code examples
rmedian

Calculate median from x, y data R


I have a data frame on population of particles with given size. Data is organized in a dataframe where the first column represents the size (x value) and the other columns represent the density (y-values) for the actual size. I need to calculate the median for all the columns. Since median() works with hist data, I decided to transform my dataset to this type by adding Nth time the value of the first column to a vector and get N from all the columns for the rows. This actually works, but really slow with my 1200 lines dataframes, so I wonder if you have a more efficient solution.

df <- data.frame(Size = c(1:100),
                 val1 = sample(0:9,100,replace = TRUE,),
                 val2 = sample(0:9,100,replace = TRUE))

get.median <- function(dataset){
  results <- list()
  for(col in colnames(dataset)[2:ncol(dataset)]){
    col.results <- c()
    for(i in 1:nrow(dataset)){
      size <- dataset[i,"Size"]
      count <- dataset[i,col]
      out <- rep(size,count)
      col.results <- c(col.results,out)
    }
    med <- median(col.results)
    results <- append(results,med)
  }
  return(results)  
}

get.median(df)

Solution

  • Without transforming:

    lapply(df[,2:3], function(y) median(rep(df$Size, times = y)))
    $val1
    [1] 49
    
    $val2
    [1] 47
    

    data:

    set.seed(99)
    df <- data.frame(Size = c(1:100),
                     val1 = sample(0:9,100,replace = TRUE,),
                     val2 = sample(0:9,100,replace = TRUE))