Search code examples
rr-factor

How to add a factor in a new column according to the existing factors


Data looks like this:

   statenum casenum vnumber pnumber numfatal
1        48    3081       1       1        1
2        48    3080       5       1        1
3        48    3080       4       1        1
4        48    3080       1       1        1
5        48    3080       2       1        1
6        48    3080       3       1        1
7        48    3079       1       1        1
8        47    3080       1       1        1
9        47    3080       3       4        1
10       47    3080       2       3        1
11       47    3080       3       2        1
12       47    3080       2       2        1
13       47    3080       3       3        1
14       47    3080       2       1        1
15       47    3080       4       1        1
16       47    3080       3       1        1
17       47    3077       2       1        1

I have 5 rows with statenum=48 and casenum=3080, and 9 rows with statenum=47 and casenum=3080.

How can I add a column of factors with value taking 5 and 9 in each of the columns respectively?

I hope to add a row like this:

   statenum casenum vnumber pnumber numfatal new row
1        48    3081       1       1        1       1
2        48    3080       5       1        1       5
3        48    3080       4       1        1       5
4        48    3080       1       1        1       5
5        48    3080       2       1        1       5
6        48    3080       3       1        1       5
7        48    3079       1       1        1       1
8        47    3080       1       1        1       9
9        47    3080       3       4        1       9
10       47    3080       2       3        1       9
11       47    3080       3       2        1       9
12       47    3080       2       2        1       9
13       47    3080       3       3        1       9
14       47    3080       2       1        1       9
15       47    3080       4       1        1       9
16       47    3080       3       1        1       9
17       47    3077       2       1        1       1

To show the number of rows share the same value of statenum and casenum.


Solution

  • Something like this I think:

    df$new <- with(df,ave(sequence(nrow(df)),list(statenum,casenum),FUN=length))
    
    > df
       statenum casenum vnumber pnumber numfatal new
    1        48    3081       1       1        1   1
    2        48    3080       5       1        1   5
    3        48    3080       4       1        1   5
    4        48    3080       1       1        1   5
    5        48    3080       2       1        1   5
    6        48    3080       3       1        1   5
    7        48    3079       1       1        1   1
    8        47    3080       1       1        1   9
    9        47    3080       3       4        1   9
    10       47    3080       2       3        1   9
    11       47    3080       3       2        1   9
    12       47    3080       2       2        1   9
    13       47    3080       3       3        1   9
    14       47    3080       2       1        1   9
    15       47    3080       4       1        1   9
    16       47    3080       3       1        1   9
    17       47    3077       2       1        1   1
    

    You may also be interested in the "data.table" package:

    library(data.table)
    DT <- data.table(df)
    DT[, new_col := .N, by = list(statenum, casenum)]