Search code examples
rfrequencymedianpsych

Expanding a Frequency Table Where the Variable Names are the Values


I am working with a dataframe where each observation is linked to a specific ID, and I have a set of variables that define the "values" as if I had a factor variable. However, the value in the "cell" is the frequency. Here is a simplified version:

ID  1  2  3
A   2  3  2
B   1  4  1

I would like to get two vectors that expand the frequencies so that I can calculate an interpolated median for each ID. That is, I'd like something of the form:

A  B
1  1
1  2
2  2
2  2
2  2
3  3
3

The psych package has a function interp.median that could then take each vector and return the interpolated median for each ID that I would like to include as a new variable in the original dataframe. I checked out the vcdExtra package which could maybe do this with its expand.dft function, but I'm not sure exactly how it would work.

Any help would be greatly appreciated!

EDIT: To refine a bit more, interp.median would work best if the final result was a data frame, with NAs padded at the end. That is, something of the form:

A  B
1  1
1  2
2  2
2  2
2  2
3  3
3  NA

Solution

  • If dat is the dataset

      lst <- by(dat[,-1], dat[,1], function(x) rep(seq_along(x), x))
      lst
      #dat[, 1]: A
      #[1] 1 1 2 2 2 3 3
      #------------------------------------------------------------ 
     #dat[, 1]: B
     #[1] 1 2 2 2 2 3
    
     indx <- max(sapply(lst,length))
     dat2 <- do.call(data.frame,lapply(lst, function(x) c(x,rep(NA,indx-length(x)))))
     dat2
     #  A  B
     #1 1  1
     #2 1  2
     #3 2  2
     #4 2  2
     #5 2  2
     #6 3  3
     #7 3 NA
    

    Or

      lst2 <- lapply(split(dat[,-1], dat$ID), function(x) rep(seq_along(unlist(x)), unlist(x)))
    
      do.call(data.frame,lapply(lst2, function(x) c(x,rep(NA,indx-length(x)))))
    

    data

     dat <-  structure(list(ID = c("A", "B"), `1` = c(2L, 1L), `2` = 3:4, 
    `3` = c(2L, 1L)), .Names = c("ID", "1", "2", "3"), class = "data.frame", row.names = c(NA, 
     -2L))