Search code examples
rdataframedatatablerepeat

R Repeat values based on value in dataframe


I am trying to create a long data frame whose values are created from a lookup dataframe

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

df_lookup
#>   id one two three
#> 1  1  10   0     0
#> 2  2   9   1     0
#> 3  3   7   2     1

The output I am looking for is a data frame with 30 rows, where first ten rows are all 1s, for the next 10 rows the values are nine 1s and one 2, for the last 10 rows the values are seven 1s, two 2s and one 3.

df_output

   id bin
 1:  1   1
 2:  1   1
 3:  1   1
 4:  1   1
 5:  1   1
 6:  1   1
 7:  1   1
 8:  1   1
 9:  1   1
10:  1   1
11:  2   1
12:  2   1
13:  2   1
14:  2   1
15:  2   1
16:  2   1
17:  2   1
18:  2   1
19:  2   1
20:  2   2
21:  3   1
22:  3   1
23:  3   1
24:  3   1
25:  3   1
26:  3   1
27:  3   1
28:  3   2
29:  3   2
30:  3   3

Based on some online search for similar questions like here

I was able to come up with the following code

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

col_names = c("one","two","three")
setDT(df_lookup)

df_output = data.frame()

for (j in 1:length(col_names)){
  temp_df = df_lookup[, .(rep(j, get(as.character(col_names[j])))),.(id)] 
  df_output = rbind(df_output,temp_df) 
}

names(df_output) = c("id","bin")

df_output = df_output[order(df_output$id,df_output$bin),]

While this solves the purpose, it can take some time to run when there are many 'id' or many 'df_lookup' tables which I need to loop through.

So wanted to check if there's any optimal/faster way to achieve 'df_output'


Solution

  • A data.table solution using melt() and rep()

    library(data.table)
    
    df_lookup = data.frame(id = c(1,2,3),
                           one = c(10,9,7),
                           two = c(0,1,2), 
                           three = c(0,0,1))
    
    dt <- data.table::as.data.table(df_lookup)
    
    # into long format
    dt_melt <- melt(dt, id.vars = "id")
    dt_melt
    #>    id variable value
    #> 1:  1      one    10
    #> 2:  2      one     9
    #> 3:  3      one     7
    #> 4:  1      two     0
    #> 5:  2      two     1
    #> 6:  3      two     2
    #> 7:  1    three     0
    #> 8:  2    three     0
    #> 9:  3    three     1
    dt_exploded <- dt_melt[, rep(variable, value), by = id]
    dt_exploded[, bin := data.table::fcase(V1 == "one", 1,
                                           V1 == "two", 2,
                                           V1 == "three", 3)][]
    #>     id    V1 bin
    #>  1:  1   one   1
    #>  2:  1   one   1
    #>  3:  1   one   1
    #>  4:  1   one   1
    #>  5:  1   one   1
    #>  6:  1   one   1
    #>  7:  1   one   1
    #>  8:  1   one   1
    #>  9:  1   one   1
    #> 10:  1   one   1
    #> 11:  2   one   1
    #> 12:  2   one   1
    #> 13:  2   one   1
    #> 14:  2   one   1
    #> 15:  2   one   1
    #> 16:  2   one   1
    #> 17:  2   one   1
    #> 18:  2   one   1
    #> 19:  2   one   1
    #> 20:  2   two   2
    #> 21:  3   one   1
    #> 22:  3   one   1
    #> 23:  3   one   1
    #> 24:  3   one   1
    #> 25:  3   one   1
    #> 26:  3   one   1
    #> 27:  3   one   1
    #> 28:  3   two   2
    #> 29:  3   two   2
    #> 30:  3 three   3
    #>     id    V1 bin
    

    You could do the mapping of one to 1 at any time though