Search code examples
rmergedataframecategorization

How to Use reference table to insert rows into a data frame in R?


I have a data frame (labels) that I would like to use as a reference or lookup table of the form:

V1  V2
1   1   WALKING
2   2   WALKING_UPSTAIRS
3   3   WALKING_DOWNSTAIRS
4   4   SITTING
5   5   STANDING
6   6   LAYING

The data frame to use the reference table is (test, ncol = 564, nrow = 2947) where the first three colnames are (test_subject, test_label(num 1-6), data_set) where test_label(1-6) equal the strings referenced above.

Could someone help me figure out how I can use my lookup table to insert a new column called "activity_label" and each observation of that column would correspond to the string equivalent of the referenced number from the reference table.

E.g., if test_label row 1 equals 5 then activity_label row 1 would equal "Standing"

Thanks so much for all of your help!

#

After using the merge method:

> test2[1:10, 564: 565]
   angle(Z,gravityMean) activity_label
1            0.04404283        walking
2            0.04134032        walking
3            0.04295217        walking
4            0.03611571        walking
5           -0.09080307        walking
6           -0.08602478        walking
7           -0.07997668        walking
8            0.04372663        walking
9            0.19900166        walking
10           0.20350821        walking

analyzing structure of the remaining dfs

> str(test1)
'data.frame':   2947 obs. of  565 variables:
 $ test_labels                         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ test_subject                        : int  12 12 12 12 4 4 4 12 9 9 ...
 $ observ_set                          : Factor w/ 1 level "test": 1 1 1 1 1 1 1 1 1 1 ...
 $ tBodyAcc-mean()-X                   : num  0.228 0.303 0.237 0.306 0.29 ...

> str(train1)
'data.frame':   7352 obs. of  565 variables:
 $ train_labels                        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ V1                                  : int  27 7 7 26 7 26 6 6 6 7 ...
 $ observ_set                          : Factor w/ 1 level "train": 1 1 1 1 1 1 1 1 1 1 ...
 $ tBodyAcc-mean()-X                   : num  0.262 0.354 0.344 0.292 0.314 ...

Solution

  • One way is to use ifelse :

    if data frame = test and activity number column = activitynum,

    test$activitylabel <- ifelse(test$activitynum == 1, "walking, ifelse(test$activitynum == 2, "walking_upstairs", ifelse(test$activitynum == 3, "walking_downstairs", ifelse(test$activitynum == 4, "sitting", ifelse(test$activitynum == 5, "standing", ifelse(test$activitynum == 6, "laying", NA))))))
    

    another way is to create a look-up table and then do a merge as suggested by @Jaehyeon:

    lookup <- data.frame(activitynum = c(1,2,3,4,5,6), activity = c("walking", "walking_upstairs", "walking_downstairs", "standing", "sitting", "laying"))
    
    survey <- data.frame(id = c(seq(1:10)), activitynum = floor(runif(10, 1, 7)), var1 = runif(10, 1, 100))
    
    merge(survey, lookup, by = "activitynum", all.x = TRUE)
    
    > str(lookup)
    'data.frame':   6 obs. of  2 variables:
     $ activitynum: num  1 2 3 4 5 6
     $ activity   : Factor w/ 6 levels "laying","sitting",..: 4 6 5 3 2 1
    > str(survey)
    'data.frame':   10 obs. of  3 variables:
     $ id         : int  1 2 3 4 5 6 7 8 9 10
     $ activitynum: num  1 2 4 1 4 6 2 4 2 2
     $ var1       : num  52.3 60.5 53.3 49.8 73.1 ...