How to sample percent by group using data.table?

This post discusses a routine for sampling with different percentages by group.

But what about if you just want to sample, say, 50% without replacement by group? What about if you want to sample 50% with replacement by group?

With dplyr, you have sample_frac to perform this. What about data.table?

Solution

If the group ordering of the data.table to be sampled remains stable throughout the simulation, pre-calculating the indices more than doubles the speed for thousands of replications.

library(data.table)

dt <- data.table(A = sample(1:10, 1e3, 1), B = sample(1000))

system.time(for (i in 1:1e4) dt[dt[, .I[sample(.N, .N%/%2)], A][[2]]])
#>    user  system elapsed 
#>    4.83    0.23    5.06
system.time({
  idx <- dt[,.(.(.I)), A][[2]]
  for (i in 1:1e4) dt[unlist(lapply(idx, function(x) sample(x, length(x)%/%2)))]
})
#>    user  system elapsed 
#>    1.78    0.13    1.90

Matching pair-wise columns from left to right across rows in one dataframe to another dataframe and adding new columns with matching values
color mapping in geom_conn_bundle not showing correctly
Print R package startup message AFTER automatic package conflict messages instead of before
Summing a set of R dataframe rows (column-wise), while retaining the first n columns
Added variable / partial regression plots for groups in an interaction?
how to make a topoplot in R with coordinates variable distribution
List of all functions in base R?
Plotting multiple plots for different initial conditions in one graph
Printing repetitively on the same line in R
Generating UI/Server based on initial selection
Subset dataframe based on pickerInput
How to let user pick the data in R-shiny?
Couldn't show my simple bar charts separately on Shiny R dashboardBody
How to programmatically filter contents of a second shiny app displayed via iframe
How to select specific interesting groups for the boxplot in R Shiny app?
Crosstable and Plot grouping with reactive values
Is there a way to make multiple Shiny picker inputs where the selections must be disjoint?
Delay/avoid duplication of shiny server side functions until after credentials
Predictions only returns value "1"
How to display a busy indicator in a shiny app?
Append doesn't work when writing to CSV in R
Changing the start date of a gantt chart in DiagrammeR
Check for installed packages before running install.packages()
Compare two columns element-wise
For loop is several times faster in R than in Python using the rpy2 library
Setting a flag based on two samples' dates
How to bring pop-ups for long text string selection items forward in front of modal they are rendered from when using virtualSelectInput()?
R data.table update join by reference the, but updating the RIGHT table
How to add tooltip labels in R Sparkline?
How to draw circles (circular dots) from linetype in ggplot2