Background and context
I am new to R, but I have some basic understanding of how to run a bootstrap procedure for individual variables. However, from the online guides I have looked at, the examples that are used only include a single variable and their outcome ends up being a histogram that includes the generated means from all the resampling and the frequency.
I am looking to perform a bootstrap of my sample where my data is dependent on two variables (participant age & test score). I understand how I could bootstrap my variables independently so that I can bootstrap age or score, but given that participants of the same age sometimes get different scores, I am not sure how I would be able to determine which score corresponds with the age that is bootstrapped.
For example, a 20-year-old participant has a score of 50, and a second 20-year-old has a score of 70, and these are within my data. If I were to run a bootstrap with replacement based on age, it is possible that one of the 20-year-olds will be selected and replaced back into the dataset. However, I do not know what their corresponding score would be - i.e., I do not know whether the one who scored 50 or the one who scored 70 was selected.
Others I have asked mention I might need to extract age and score together, corresponding to a single row, to retain the dependency between the two. The data file I have on R is a row for each participant, with age in one column and score in another.
What am I looking for?
The end goal of the bootstrapping is to resample (with replacement) my data 200 times to give me 200 "different" sets of data, which I can put into a quadratic function to determine the vertex of the graph. These 200 values will be combined to generate a mean and standard error.
Having little experience with R coding, I have not tried a great deal other than understanding the basics of bootstrapping (with replacement).
I am aware that it is possible to mutate/merge data, but I do not believe it fits with this. I am not sure of how to proceed, and any support (sources of information or where I can look etc.) would be greatly appreciated.
You could run the resampling on the indices.
For example:
set.seed(1)
df <- data.frame(age = rep( seq(20,50,10), each=2), score = sample(50:70, 8))
age score
1 20 68
2 20 62
3 30 53
4 30 67
5 40 66
6 40 55
7 50 51
8 50 65
Resample:
df[sample( seq_len(nrow(df) ), nrow(df), replace = TRUE), ]
age score
6 40 55
4 30 67
1 20 68
7 50 51
1.1 20 68
5 40 66
1.2 20 68
3 30 53