Search code examples
rdataframefor-loopuniquekernel-density

Extract density estimation for different groups


I have a data frame (df) that looks like this:

> summary(df)
   Occurence        Group          
 Min.   :0.001   Length:7990       
 1st Qu.:0.028   Class :character  
 Median :0.160   Mode  :character  
 Mean   :0.195                     
 3rd Qu.:0.307                     
 Max.   :0.600                     
 NA's   :5473

> unique(df$Group)
 [1] "fa20,0"   "sa20,0"   "fa05,0"   "sa10,0"   "flatsa,0" "flatfa,0" "fa10,0"   "sa05,0"   "flatsa,1" "fa10,1"   "fa05,1"   "sa20,1"   "flatfa,1" "fa20,1"   "sa10,1"   "sa05,1" 

I am trying to get a kernel density estimation of occurence by each unique Group witht he density() function. I am able to do it one group at a time:

> flatsa <- density(c(as.numeric(ag04_pattern_long$Occurence[ag04_pattern_long$Group == "flatsa,0"])), na.rm=T)

> flatsa_df2 <- enframe(flatsa$x, value = "X") %>%
+     add_column(Y=flatsa$y) %>%
+     add_column(Group = "flatsa,0") %>%
+     select(-name)

Which produces this output for flatsa_df2:

# A tibble: 512 x 3
        X       Y Group   
    <dbl>   <dbl> <chr>   
 1 -0.168 0.00317 flatsa,0
 2 -0.166 0.00351 flatsa,0
 3 -0.164 0.00387 flatsa,0
 4 -0.162 0.00427 flatsa,0
 5 -0.161 0.00471 flatsa,0
 6 -0.159 0.00519 flatsa,0
 7 -0.157 0.00570 flatsa,0
 8 -0.155 0.00628 flatsa,0
 9 -0.153 0.00689 flatsa,0
10 -0.151 0.00755 flatsa,0
# ... with 502 more rows

How can I do this for all 16 unique elements in df$Group at once? Ideally they will all go into one single dataframe. I have tried:

dens_table <- setDT(ag04_pattern_long)[, .(dens=density(ag04_pattern_long$Occurence, na.rm=T)), by = Group]

for(i in length(unique(ag04_pattern_long$Group))){
  dens_table <- density(c(as.numeric(ag04_pattern_long$Occurence[i], na.rm=T)))
}

But none of those produce the correct output. The loops gives me an error, saying that it needs "at least 2 points to select a bandwidth". I think this indicates that it does not take all df$Occurence values for each unique(df$Group) into account .

Help!


Solution

  • Here's a base R method:

    occur_list = split(df$Occurrence, df$Group)
    est_list = lapply(df_list, function(x) {
      data.frame(density(x, na.rm=T)[c("x", "y")])
    })
    results = do.call(rbind, est_list)
    results$Group = rep(names(occur_list), each = sapply(est_list, nrow))
    

    We could also use a for loop, adapting your attempt:

    results = list()
    for(i in unique(ag04_pattern_long$Group)){
      results[[i]] <- data.frame(density(ag04_pattern_long$Occurence[ag0f_pattern_long$Group == i], na.rm = T)[c("x", "y")])
      results[[i]]$Group = i
    }
    results = do.call(rbind, results)
    

    Or using dplyr:

    df %>% 
      nest_by(Group) %>%
      mutate(dens = list(data.frame(density(data$Occurrence)[c("x", "y")]))) %>%
      select(-data) %>%
      unnest(cols = dens)
    

    In all cases I've removed the c(as.numeric()) from inside the loop. Make sure the entire Occurrence column is numeric before looping - that is better than converting each piece of the column inside the loop.