Search code examples
rdatesortingcalculated-columns

enumerate instances in R and add a new column


I have a table:

dt <- data.table(instance = c("A","A","A","B","B","B", "C","C","C","C","C","A","A",
    "B","B","B", "C","C","C","C","C"), 
   date = c("2019-02-25","2019-02-25","2019-02-25","2019-02-25","2019-02-25",
   "2019-02-25", "2019-02-25","2019-02-25","2019-02-25","2019-02-25",
   "2019-02-25","2019-03-01","2019-03-01","2019-03-01","2019-03-01",
   "2019-03-01", "2019-03-01","2019-03-01","2019-03-01","2019-03-01","2019-03-01"), 
 y = c("0,1","0,2","0,2","0,1","0,1","0,15","0,1","0,2","0,3","0,1","0,1",
   "0,1","0,1","0,1","0,25","0,3","0,1","0,1","0,15","0,1","0,2")
dt

I need to add a column "N" in which instances will be ordered from 1 to max number of instances for currency (here maximum number is 5 (number of rows with currency RON)). And all types of currency should be enumerated from 1 to this maximum number. And if there is smaller number of variables for some currencies it should add rows where values for column "N" will be missing Na.

So, I need a code after which I could get the following table:

| instance | date | y  | N|
:-----|-----| ------|-----|
| A | 2019-02-25 | 0,1 |1|
| A | 2019-02-25 |0,2  |2|
| A | 2019-02-25 |0,2  |3|
| A | 2019-02-25 |Na   |4|
| A | 2019-02-25 |Na   |5|
| B | 2019-02-25 |0,1  |1|
| B | 2019-02-25 |0,1  |2|
| B | 2019-02-25 |0,1  |3|
| B | 2019-02-25 |Na   |4|
| B | 2019-02-25 |Na   |5|
| C | 2019-02-25 |0,1  |1|
| C | 2019-02-25 |0,2  |2|
| C | 2019-02-25 |0,3  |3|
| C | 2019-02-25 |0,1  |4|
| C | 2019-02-25 |0,1  |5|
...

Solution

  • This is a perfect opportunity for tidyr::complete.

    library(dplyr)
    library(tidyr)
    
    dat  |>
        group_by(currency, date)  |>
        mutate(N = row_number())  |>
        ungroup()  |>
        complete(currency, date, N) |>
        arrange(date, currency, N)
    
    # # A tibble: 30 x 4
    #    currency date           N y    
    #    <chr>    <chr>      <int> <chr>
    #  1 EUR      2019-02-25     1 0,1
    #  2 EUR      2019-02-25     2 0,2
    #  3 EUR      2019-02-25     3 0,2
    #  4 EUR      2019-02-25     4 NA
    #  5 EUR      2019-02-25     5 NA
    #  6 RON      2019-02-25     1 0,1
    #  7 RON      2019-02-25     2 0,2
    #  8 RON      2019-02-25     3 0,3
    #  9 RON      2019-02-25     4 0,1
    # 10 RON      2019-02-25     5 0,1
    # # ... with 20 more rows