Search code examples
rfunctionfor-loopsampling

R: Why won't my function create objects in my environment


I want to write a function that will create n random samples of a data set without replacement.

In this example I am using the iris data set. The iris data set has 150 observations and say I want 10 samples.

My attempt:

#load libraries
library(dplyr)    

# load the data
data(iris)
head(iris)

# name df
df = iris

# set the number of samples
n = 10

# assumption: the number of observations in df is divisible by n
# set the number of observations in each sample
m = nrow(df)/n

# create a column called row to contain initial row index
df$row = rownames(df)

# define the for loop
# that creates n separate data sets
# with m number of rows in each data set

for(i in 1:n){
  # create the sample
  sample = sample_n(df, m, replace = FALSE) 

  # name the sample 'dsi'
  x = assign(paste("ds",i,sep=""),sample)

  # remove 'dsi' from df
  df = df[!(df$row %in% x$row),]

}

When I run this code I get what I want. I get the random samples named ds1,ds2,...,ds10.

Now when I try to turn it into a function:

samplez <- function(df,n){

  df$row = rownames(df)

  m = nrow(df)/n

  for(i in 1:n){

    sample = sample_n(df, m, replace = FALSE) 

    x = assign(paste("ds",i,sep=""),sample)

    df = df[!(df$row %in% x$row),]

  }

}

Nothing happens when I execute 'samplez(iris,10)'. What am I missing?

Thanks


Solution

  • Just save the results in a list and return that. Then you'll have a single object, the list of samples, in your global environment, rather than cluttering up your environment with a bunch of similar data frames.

    I'm not sure what you're trying to do with df, but here is how to return all of the samples. Let me know what you want to do with df and I can add that as well:

    samplez <- function(df,n){
    
      samples = list()
    
      df$row = rownames(df)
    
      m = nrow(df)/n
    
      for(i in 1:n){
    
        samples[[paste0("ds",i)]] = sample_n(df, m, replace = FALSE) 
    
        df = df[!(df$row %in% samples[[i]]$row),]
    
      }
      return(samples)
    }