I want to write a function that will create n random samples of a data set without replacement.
In this example I am using the iris data set. The iris data set has 150 observations and say I want 10 samples.
My attempt:
#load libraries
library(dplyr)
# load the data
data(iris)
head(iris)
# name df
df = iris
# set the number of samples
n = 10
# assumption: the number of observations in df is divisible by n
# set the number of observations in each sample
m = nrow(df)/n
# create a column called row to contain initial row index
df$row = rownames(df)
# define the for loop
# that creates n separate data sets
# with m number of rows in each data set
for(i in 1:n){
# create the sample
sample = sample_n(df, m, replace = FALSE)
# name the sample 'dsi'
x = assign(paste("ds",i,sep=""),sample)
# remove 'dsi' from df
df = df[!(df$row %in% x$row),]
}
When I run this code I get what I want. I get the random samples named ds1,ds2,...,ds10.
Now when I try to turn it into a function:
samplez <- function(df,n){
df$row = rownames(df)
m = nrow(df)/n
for(i in 1:n){
sample = sample_n(df, m, replace = FALSE)
x = assign(paste("ds",i,sep=""),sample)
df = df[!(df$row %in% x$row),]
}
}
Nothing happens when I execute 'samplez(iris,10)'. What am I missing?
Thanks
Just save the results in a list and return that. Then you'll have a single object, the list of samples, in your global environment, rather than cluttering up your environment with a bunch of similar data frames.
I'm not sure what you're trying to do with df
, but here is how to return all of the samples. Let me know what you want to do with df
and I can add that as well:
samplez <- function(df,n){
samples = list()
df$row = rownames(df)
m = nrow(df)/n
for(i in 1:n){
samples[[paste0("ds",i)]] = sample_n(df, m, replace = FALSE)
df = df[!(df$row %in% samples[[i]]$row),]
}
return(samples)
}