I am working with the BTYD model to generate forecast on customer future transactions. Unfortunately, due to the use of mcmc methods I cannot run the forecast on my whole base of customers (hundreds of thousands) so I have to split the base in many random samples and perform several runs of this model on each of them to retrieve the forecast.
My idea was to use a loop to do the following:
(Every ID must be in one sample only).
Unfortunately my code doesn't seem to be working in the way I want (I am not very good with loops at the moment.
getwd()
data<-read.csv("MOCK_DATA (1).csv")
# this is a fake dataset of 1000 rows that contains only 2 columns:
# customer ID (column name: "id") and a random number (column name "value").
# Every customer ID appears only once in the dataset.
head(data)
set.sample.size<-100
num.cycles<-ceiling(nrow(data)/set.sample.size)
for(i in 1:(num.cycles)) {
nam <- paste("sample_", i, sep = "")
assign(nam, data[sample(nrow(data), set.sample.size), ])
data<-data[!(data$id %in% nam$id),]
}
This code generates the following error: Error in nam$id : $ operator is invalid for atomic vectors
What I expect is to get 10 objects called "sample_1".."sample_10" each of them made of 100 random id from the original data but all unique (no ID are shared between the 10 samples).
Here's a reproducible example using the iris
dataset
set.sample.size<-10
num.cycles<-ceiling(nrow(iris)/set.sample.size)
iris$id <- 1:150
for(i in 1:(num.cycles)) {
nam <- paste("sample_", i, sep = "")
assign(nam, iris[sample(nrow(iris), set.sample.size), ])
iris<-iris[!(iris$id %in% get(nam)$id),]
}
The only issue in your code is nam$id
doesn't make sense, since nam
is simply a string (the name of the dataframe, not the dataframe itself)