I'm looking for some help with adding some columns to a data frame that I can't seem to get right. My df is about 40,000 rows x 30 columns and is structured like this:
Key 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 NumNotNA
317862 2151 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0
317881 1105 51 33 98 8 27 56 28 17 39 11 112 14 87 7 273 56 189 245 119 7 147 25 NA NA NA 22
290507 111 56 28 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
290509 196 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0
290520 91 0 2 1 4 1 1 0 0 0 1 1 4 0 1 0 5 0 1 0 1 0 0 4 1 0 25
290521 92 2 5 22 3 4 1 19 0 5 0 1 13 0 0 0 1 0 0 1 1 0 46 15 0 1 25
I am looking to add 11 new columns to the data frame, one for each weighted quantile in the sequence seq(0,1,0.1). I am using the following cosine weights to weight the observations from column names 1:25 in the df above.
cosine_weights <- 1 + cos(pi*seq(0,24)/24)
I am able to get the values I'm looking for by using the wtd.quantile() fucntion from both Hmisc and reldist packages on a single row, but I cannot seem to figure out how to use the apply functions correctly in a useful way to apply this to the entire data frame. Note, one of the issues is that several rows (like rows 1 & 4 above) contain all NA values for column names 1:25. For these rows, I would like my quantiles to yield NA values as well, but to yield valid quantiles for the other rows.
Here is an example of the output I'm looking for from each row, although I would like to append this as new columns to my existing data frame. I have tried to use this with mapply, but keep running into issues. I have to coerce the data as numeric to get this to compute, but when I try to apply it to the full df, it says that my (list) object cannot be coerced to type 'double'. Each fix I try seems to fail, and I cannot find any other similar examples through my searches.
> Hmisc::wtd.quantile(as.numeric(dat[6, 3:27]), weights=cosine_weights, probs = seq(0,1,0.1))
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.0 0.0 0.0 1.0 1.6 3.0 4.0 5.0 6.6 20.8 46.0
Thank you for your help!
We can write a function to claculate the values that we want
calculate_quantile <- function(x) {
if (all(is.na(x))) return(rep(NA, 11))
as.numeric(Hmisc::wtd.quantile(x, weights=cosine_weights, probs = seq(0,1,0.1)))
}
and apply it row-wise using apply
dat[paste0('Q', 1:11)] <- t(apply(dat[3:27], 1, calculate_quantile))