I am trying to understand how to properly format a combination of lapply, rbind and do.call in a statement and can't get the statement to run properly. I have supplied a simple example function and data that I'm using to try to understand the formatting with. I fully understand that the scenario I've supplied could be ran using a simpler method, the purpose of this is to simply understand the formatting and how to use lapply and rbind on a custom function.
Here's some test data:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
65 1009 8
22 1222 95
22 1223 89
I'm essentially trying to subset the data inside a custom function using a list of facility id values and then want to bind each data table together that results from the custom function.
Here's the code I've used:
facilities_id_list<-c(123, 25)
facility_counts<-function(facilities_id_list){
facility<-facilities_id_list[[i]]
subset<-data[facility_id==facility]
}
results <- do.call("rbind", lapply(seq_along(facilities_id_list), function(i) facility_counts)
The result I'm hoping to achieve:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
Why does this not work? Do I need to change the formatting?
Instead of using ==
, use %in%
for direct subsetting
subset(data, facility_id %in% facilities_id_list)
In the OP's code, there are multiple issues - 1) the input argument is facilities_id_list
where as in lapply
, we are looping over the sequence i
., 2) facility_id==facility
should be data$facility_id==facility
as we are using [
and there is no data binding, 3) We need to specify that we are subsetting with row index as by default without any ,
, it is taken as column index in data.frame
facility_counts<-function(i){
facility<-facilities_id_list[[i]]
data[data$facility_id == facility, ]
}
> do.call(rbind, lapply(seq_along(facilities_id_list), facility_counts))
facility_id patient_number test_result
1 123 1000 25
2 123 1000 30
3 25 1001 12
4 25 1002 67
5 25 1010 75