I've got a bit of a big problem here that I would really appreciate some help on. Essentially I have a large dataframe that looks like this. PLEASE NOTE ALL THIS R CODE IS IN TERMINAL AND NOT R STUDIO!
![Dataframe]https://i.sstatic.net/OkmfC.jpg
What I'm trying to do is separate the dataframe by the unique val_lvl2 treatments.
Here is code of exactly what I want to do but on a much larger scale.
Function code:
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
CODE:
holder1 <- subset(z_combined_cost_dtrmnt, val_lvl2 == "Hammer Toe Repair")
holder1 <- holder1[!(holder1$episode_count <=3),]
holder1$prd_num_of_days_num <- remove_outliers(holder1$prd_num_of_days_num)
This will remove all of the outlier lengths for Hammer Toe Repair in val_lvl2 which is exactly what I want. However, I don't want to do this step every time since there are quite a few unique treatments! After removing all the outliers I need to also remove the NA columns and merge back all the data back into the one dataframe "z_combined_cost_dtrmnt" which should now have all outlier lengths removed from it uniquely for each unique treatment in val_lvl2. At this point the code above is as far as I've gotten with removing the outliers so help would be greatly appreciated because I am positive there is a more efficient way to do this then writing out this code for each treatment!
Here is every unique treatment in val_lvl2:![Unique values]https://i.sstatic.net/ky68G.jpg
You can use split
to create a list of data frames by level of val_lvl2
...
holders <- split(z_combined_cost_dtrmnt, z_combined_cost_dtrmnt$val_lvl2)
And then apply whatever functions you want to each element of that list using lapply
, e.g.
holders <- lapply(holders, function(x) x[!x$episode_count <= 3,])
holders <- lapply(holders, function(x){
x$prd_num_of_days_num <- remove_outliers(x$prd_num_of_days_num)
return(x) })
You will end up with a list of dataframes, one for each level of val_lvl2
.