Search code examples
rsubsetlevels

R: Select names of a data.frame using levels() function


I have two data.frames input and filelist which are subset with the code below

input <- structure(list(NAME = structure(c(3L, 3L, 7L, 6L, 4L, 2L, 5L, 5L, 5L, 1L), .Label = c("Example2", "Example7", "Test", "Test2","Test3", "Test6", "Test77"), class = "factor"), REFERENCE = structure(c(2L,2L, 3L, 1L, 1L, 4L, 2L, 2L, 2L, 1L), .Label = c("EXAMPLE5", "REGION1", "REGION2", "REGION77"), class = "factor"), VALUE = structure(c(1L,1L, 2L, 3L, 4L, 6L, 5L, 5L, 5L, 7L), .Label = c("120", "13", "14", "65", "89", "B", "C"), class = "factor")), .Names = c("NAME", "REFERENCE", "VALUE"), class = "data.frame", row.names = c(NA,-10L))


filelist <- structure(list(NAME = structure(c(3L, 5L, 1L, 6L, 4L, 2L), .Label = c("","Example2", "Test", "Test2", "Test3", "Test6"), class = "factor"), REFERENCE = structure(c(3L, 3L, 1L, 2L, 2L, 2L), .Label = c("",  "EXAMPLE5", "REGION1"), class = "factor")), .Names = c("NAME","REFERENCE"), class = "data.frame", row.names = c(NA, -6L))

library(dplyr)
ana <- filelist %>% left_join(., input)

     NAME REFERENCE VALUE
1     Test   REGION1   120
2    Test3   REGION1    89
3                     <NA>
4    Test6  EXAMPLE5    14
5    Test2  EXAMPLE5    65
6 Example2  EXAMPLE5     C

and then written into different data.frames:

list2env(split(ana, f = ana$REFERENCE )[-1], .GlobalEnv)

This all works fine, but what I am trying to do in a next step is to access the NAMES for the group REGION1 (the result will later be part of a legend where I dont want all the different items in, only those selected). I am trying to do this with the levels() command

NAMES_REGION1 <- levels(REGION1$NAME)

but my output is the following:

[1] ""         "Example2" "Test"     "Test2"    "Test3"    "Test6"

what I would like to have as output is only "Test" and "Test3" because only they are part of the group REGION1 . Any ideas why this is happening?


Solution

  • Since REGION1 was produced from a larger data set, you'll want to drop the excess levels carried over after the subset. You can use droplevels

    REGION1$NAME <- droplevels(REGION1$NAME)
    levels(REGION1$NAME)
    # [1] "Test"  "Test3"
    

    Notice that the REFERENCE column also has levels that were carried over. You can remove the extra levels in both columns at once with

    REGION1[-3] <- lapply(REGION1[-3], droplevels) 
    

    Now we can see that all extra levels are gone

    lapply(REGION1[-3], levels)
    # $NAME
    # [1] "Test"  "Test3"
    #
    # $REFERENCE
    # [1] "REGION1"