Search code examples
rsapply

Understanding why sapply in R returns vector of longer length


I am parsing through individual lines of code in R to get a better understanding of a large function. While I know sapply apples a function over a vector, I am having trouble understanding what it is doing in a specific instance. Unfortunately, I cannot find a clear explanation on this exact question elsewhere.

If you simulate data with the sample below code, the variable Y is a vector with 2000 values and calc_sizes is the number of unique values in Y (87 unique values). When sapply is applied to likelihoods it is done in the context of likelihoods[calc_sizes]<-sapply(calc_sizes, nb.likelihood). This returns a vector of 688 values

What is sapply doing here, and how/why does it return a vector of 688 and not 87?

To be clear, the code is functioning properly - there is no technical issue. I would just prefer to understand and learn as opposed to blindly writing code that eventually worked (and so I don't have to bug this forum so often).

You should be able to copy and paste the code verbatim to get the results mentioned. Thank you for any insight!

#################################################################
#Functions that are needed to generate and apply to sample data
#################################################################
bp <- function(gens=20, init.size=1, offspring, ...){  
  Z <- list() #initiate the list
  Z[[1]] <- init.size #set the first position of the list as the number of index cases
  i <- 1 
  while(sum(Z[[i]]) > 0 && i <= gens) { 
    Z[[i+1]] <- offspring(sum(Z[[i]]), ...) 
    i <- i+1 
  } 
  return(Z)
}
nb.likelihood<-function(x){
  lgamma(k*x+(x-1))-(lgamma(k*x)+lgamma(x+1))+(x-1)*log(r0/k)-(k*x+(x-1))*log(1+r0/k)    
}

####################
#Generate sample data
####################
set.seed(123)
Z<-replicate(n=2000,bp(offspring=rnbinom,mu=0.9,size=0.25)) 
Y<-unlist(lapply(Z,function(x) sum(unlist(x))))

#Generate variables in question 
calc_sizes<-unique(c(1,Y))
likelihoods<-c()
#########################
#line of code in question
#########################
likelihoods[calc_sizes]<-sapply(calc_sizes, nb.likelihood) 

Solution

  • To specifically answer the reason for 688 is that you are assigning values by un-ordered indexes from values of calc_size to the empty vector, likelihoods. Specifically, calc_sizes comprises of 87 integer values. Notice the smallest term is 1 and largest term is 688.

    calc_sizes
    
     [1]   1  40   2   3 180   4  19  21  10  49   6  18   5  11  81  23 186 189   8
    [20]  41  27  20   7  12  68   9  25  16 131  15  51  26  22  17 648  24  30  32
    [39]  28  36  53  96  47  14 548 109  38  31  99 106  34  39 607 100 233  42  66
    [58] 129  33 170 102  70  13  46  63  79  97 447 460 688 346 130  44  69 620 113
    [77]  92 256 153  78 462 325  61  64  54  76  86
    

    However, your intended sapply also contains 87:

    sapply(calc_sizes, nb.likelihood) 
    
     [1]  -1.098612 -11.788381  -2.602690  -3.413620 -30.541489  -4.001407
     [7]  -8.187640  -8.575276  -6.146104 -13.154588  -4.881330  -7.987619
    [13]  -4.475865  -6.410490 -17.680645  -8.948901 -31.297439 -31.674821
    [19]  -5.565697 -11.943435  -9.663029  -8.383385  -5.240275  -6.661805
    [25] -15.886160  -5.865802  -9.310881  -7.572668 -24.292698  -7.356445
    [31] -13.450466  -9.488087  -8.763676  -7.782825 -87.586858  -9.131222
    [37] -10.175841 -10.509013  -9.835872 -11.158148 -13.744013 -19.702968
    [43] -12.856183  -7.133302 -75.557026 -21.425116 -11.475373 -10.343221
    [49] -20.102594 -21.029806 -10.836219 -11.632377 -82.659672 -20.235490
    [55] -37.171587 -12.097586 -15.605646 -24.034010 -10.673316 -29.277798
    [61] -20.500835 -16.165367  -6.902190 -12.705964 -15.182243 -17.407459
    [67] -19.836336 -63.355209 -64.929415 -92.388061 -51.074698 -24.163398
    [73] -12.403344 -16.025922 -84.222650 -21.950431 -19.167825 -40.021951
    [79] -27.117199 -17.270506 -65.171492 -48.507264 -14.898089 -15.323743
    [85] -13.889967 -16.995849 -18.359679
    

    So in your assignment you are assigning by indexed positions the values of sapply, spread out elementwise.

    likelihoods[calc_sizes] <- sapply(calc_sizes, nb.likelihood) 
    
    likelihoods
    
      [1]  -1.098612  -2.602690  -3.413620  -4.001407  -4.475865  -4.881330
      [7]  -5.240275  -5.565697  -5.865802  -6.146104  -6.410490  -6.661805
     [13]  -6.902190  -7.133302  -7.356445  -7.572668  -7.782825  -7.987619
     [19]  -8.187640  -8.383385  -8.575276  -8.763676  -8.948901  -9.131222
     [25]  -9.310881  -9.488087  -9.663029  -9.835872         NA -10.175841
     [31] -10.343221 -10.509013 -10.673316 -10.836219         NA -11.158148
     [37]         NA -11.475373 -11.632377 -11.788381 -11.943435 -12.097586
     [43]         NA -12.403344         NA -12.705964 -12.856183         NA
     [49] -13.154588         NA -13.450466         NA -13.744013 -13.889967
     [55]         NA         NA         NA         NA         NA         NA
     [61] -14.898089         NA -15.182243 -15.323743         NA -15.605646
     [67]         NA -15.886160 -16.025922 -16.165367         NA         NA
     [73]         NA         NA         NA -16.995849         NA -17.270506
     [79] -17.407459         NA -17.680645         NA         NA         NA
     [85]         NA -18.359679         NA         NA         NA         NA
     [91]         NA -19.167825         NA         NA         NA -19.702968
     [97] -19.836336         NA -20.102594 -20.235490         NA -20.500835
    [103]         NA         NA         NA -21.029806         NA         NA
    [109] -21.425116         NA         NA         NA -21.950431         NA
    [115]         NA         NA         NA         NA         NA         NA
    [121]         NA         NA         NA         NA         NA         NA
    [127]         NA         NA -24.034010 -24.163398 -24.292698         NA
    [133]         NA         NA         NA         NA         NA         NA
    [139]         NA         NA         NA         NA         NA         NA
    [145]         NA         NA         NA         NA         NA         NA
    [151]         NA         NA -27.117199         NA         NA         NA
    [157]         NA         NA         NA         NA         NA         NA
    [163]         NA         NA         NA         NA         NA         NA
    [169]         NA -29.277798         NA         NA         NA         NA
    [175]         NA         NA         NA         NA         NA -30.541489
    [181]         NA         NA         NA         NA         NA -31.297439
    [187]         NA         NA -31.674821         NA         NA         NA
    [193]         NA         NA         NA         NA         NA         NA
    [199]         NA         NA         NA         NA         NA         NA
    [205]         NA         NA         NA         NA         NA         NA
    [211]         NA         NA         NA         NA         NA         NA
    [217]         NA         NA         NA         NA         NA         NA
    [223]         NA         NA         NA         NA         NA         NA
    [229]         NA         NA         NA         NA -37.171587         NA
    [235]         NA         NA         NA         NA         NA         NA
    [241]         NA         NA         NA         NA         NA         NA
    [247]         NA         NA         NA         NA         NA         NA
    [253]         NA         NA         NA -40.021951         NA         NA
    [259]         NA         NA         NA         NA         NA         NA
    [265]         NA         NA         NA         NA         NA         NA
    [271]         NA         NA         NA         NA         NA         NA
    [277]         NA         NA         NA         NA         NA         NA
    [283]         NA         NA         NA         NA         NA         NA
    [289]         NA         NA         NA         NA         NA         NA
    [295]         NA         NA         NA         NA         NA         NA
    [301]         NA         NA         NA         NA         NA         NA
    [307]         NA         NA         NA         NA         NA         NA
    [313]         NA         NA         NA         NA         NA         NA
    [319]         NA         NA         NA         NA         NA         NA
    [325] -48.507264         NA         NA         NA         NA         NA
    [331]         NA         NA         NA         NA         NA         NA
    [337]         NA         NA         NA         NA         NA         NA
    [343]         NA         NA         NA -51.074698         NA         NA
    [349]         NA         NA         NA         NA         NA         NA
    [355]         NA         NA         NA         NA         NA         NA
    [361]         NA         NA         NA         NA         NA         NA
    [367]         NA         NA         NA         NA         NA         NA
    [373]         NA         NA         NA         NA         NA         NA
    [379]         NA         NA         NA         NA         NA         NA
    [385]         NA         NA         NA         NA         NA         NA
    [391]         NA         NA         NA         NA         NA         NA
    [397]         NA         NA         NA         NA         NA         NA
    [403]         NA         NA         NA         NA         NA         NA
    [409]         NA         NA         NA         NA         NA         NA
    [415]         NA         NA         NA         NA         NA         NA
    [421]         NA         NA         NA         NA         NA         NA
    [427]         NA         NA         NA         NA         NA         NA
    [433]         NA         NA         NA         NA         NA         NA
    [439]         NA         NA         NA         NA         NA         NA
    [445]         NA         NA -63.355209         NA         NA         NA
    [451]         NA         NA         NA         NA         NA         NA
    [457]         NA         NA         NA -64.929415         NA -65.171492
    [463]         NA         NA         NA         NA         NA         NA
    [469]         NA         NA         NA         NA         NA         NA
    [475]         NA         NA         NA         NA         NA         NA
    [481]         NA         NA         NA         NA         NA         NA
    [487]         NA         NA         NA         NA         NA         NA
    [493]         NA         NA         NA         NA         NA         NA
    [499]         NA         NA         NA         NA         NA         NA
    [505]         NA         NA         NA         NA         NA         NA
    [511]         NA         NA         NA         NA         NA         NA
    [517]         NA         NA         NA         NA         NA         NA
    [523]         NA         NA         NA         NA         NA         NA
    [529]         NA         NA         NA         NA         NA         NA
    [535]         NA         NA         NA         NA         NA         NA
    [541]         NA         NA         NA         NA         NA         NA
    [547]         NA -75.557026         NA         NA         NA         NA
    [553]         NA         NA         NA         NA         NA         NA
    [559]         NA         NA         NA         NA         NA         NA
    [565]         NA         NA         NA         NA         NA         NA
    [571]         NA         NA         NA         NA         NA         NA
    [577]         NA         NA         NA         NA         NA         NA
    [583]         NA         NA         NA         NA         NA         NA
    [589]         NA         NA         NA         NA         NA         NA
    [595]         NA         NA         NA         NA         NA         NA
    [601]         NA         NA         NA         NA         NA         NA
    [607] -82.659672         NA         NA         NA         NA         NA
    [613]         NA         NA         NA         NA         NA         NA
    [619]         NA -84.222650         NA         NA         NA         NA
    [625]         NA         NA         NA         NA         NA         NA
    [631]         NA         NA         NA         NA         NA         NA
    [637]         NA         NA         NA         NA         NA         NA
    [643]         NA         NA         NA         NA         NA -87.586858
    [649]         NA         NA         NA         NA         NA         NA
    [655]         NA         NA         NA         NA         NA         NA
    [661]         NA         NA         NA         NA         NA         NA
    [667]         NA         NA         NA         NA         NA         NA
    [673]         NA         NA         NA         NA         NA         NA
    [679]         NA         NA         NA         NA         NA         NA
    [685]         NA         NA         NA -92.388061
    

    Notice the 70th term of calc_size is 688 and the 70th term of sapply(calc_sizes, nb.likelihood) is -92.38806 (the last numeric value above).

    calc_sizes[70]
    # [1] 688
    
    sapply(calc_sizes, nb.likelihood)[70]
    # [1] -92.3880
    

    To resolve, run along the length of calc_sizes to assign by 1:87. Also, consider initializing likelihoods with a length instead of growing the object after c().

    likelihoods <- vector(mode="numeric", length=length(calc_sizes))
    
    likelihoods[seq_along(calc_sizes)] <- sapply(calc_sizes, nb.likelihood) 
    # likelihoods[1:length(calc_sizes)] <- sapply(calc_sizes, nb.likelihood) 
    

    Nevertheless, since apply family functions return an object, simple assign directly to the sapply call without initializing anything.

    likelihoods <- sapply(calc_sizes, nb.likelihood) 
    
    likelihoods
     [1]  -1.098612 -11.788381  -2.602690  -3.413620 -30.541489  -4.001407
     [7]  -8.187640  -8.575276  -6.146104 -13.154588  -4.881330  -7.987619
    [13]  -4.475865  -6.410490 -17.680645  -8.948901 -31.297439 -31.674821
    [19]  -5.565697 -11.943435  -9.663029  -8.383385  -5.240275  -6.661805
    [25] -15.886160  -5.865802  -9.310881  -7.572668 -24.292698  -7.356445
    [31] -13.450466  -9.488087  -8.763676  -7.782825 -87.586858  -9.131222
    [37] -10.175841 -10.509013  -9.835872 -11.158148 -13.744013 -19.702968
    [43] -12.856183  -7.133302 -75.557026 -21.425116 -11.475373 -10.343221
    [49] -20.102594 -21.029806 -10.836219 -11.632377 -82.659672 -20.235490
    [55] -37.171587 -12.097586 -15.605646 -24.034010 -10.673316 -29.277798
    [61] -20.500835 -16.165367  -6.902190 -12.705964 -15.182243 -17.407459
    [67] -19.836336 -63.355209 -64.929415 -92.388061 -51.074698 -24.163398
    [73] -12.403344 -16.025922 -84.222650 -21.950431 -19.167825 -40.021951
    [79] -27.117199 -17.270506 -65.171492 -48.507264 -14.898089 -15.323743
    [85] -13.889967 -16.995849 -18.359679
    

    Online Demo