I have two data.frames input
and filelist
which are subset with the code below
input <- structure(list(NAME = structure(c(3L, 3L, 7L, 6L, 4L, 2L, 5L, 5L, 5L, 1L), .Label = c("Example2", "Example7", "Test", "Test2","Test3", "Test6", "Test77"), class = "factor"), REFERENCE = structure(c(2L,2L, 3L, 1L, 1L, 4L, 2L, 2L, 2L, 1L), .Label = c("EXAMPLE5", "REGION1", "REGION2", "REGION77"), class = "factor"), VALUE = structure(c(1L,1L, 2L, 3L, 4L, 6L, 5L, 5L, 5L, 7L), .Label = c("120", "13", "14", "65", "89", "B", "C"), class = "factor")), .Names = c("NAME", "REFERENCE", "VALUE"), class = "data.frame", row.names = c(NA,-10L))
filelist <- structure(list(NAME = structure(c(3L, 5L, 1L, 6L, 4L, 2L), .Label = c("","Example2", "Test", "Test2", "Test3", "Test6"), class = "factor"), REFERENCE = structure(c(3L, 3L, 1L, 2L, 2L, 2L), .Label = c("", "EXAMPLE5", "REGION1"), class = "factor")), .Names = c("NAME","REFERENCE"), class = "data.frame", row.names = c(NA, -6L))
library(dplyr)
ana <- filelist %>% left_join(., input)
NAME REFERENCE VALUE
1 Test REGION1 120
2 Test3 REGION1 89
3 <NA>
4 Test6 EXAMPLE5 14
5 Test2 EXAMPLE5 65
6 Example2 EXAMPLE5 C
and then written into different data.frames:
list2env(split(ana, f = ana$REFERENCE )[-1], .GlobalEnv)
This all works fine, but what I am trying to do in a next step is to access the NAMES for the group REGION1
(the result will later be part of a legend where I dont want all the different items in, only those selected). I am trying to do this with the levels()
command
NAMES_REGION1 <- levels(REGION1$NAME)
but my output is the following:
[1] "" "Example2" "Test" "Test2" "Test3" "Test6"
what I would like to have as output is only "Test"
and "Test3"
because only they are part of the group REGION1
.
Any ideas why this is happening?
Since REGION1
was produced from a larger data set, you'll want to drop the excess levels carried over after the subset. You can use droplevels
REGION1$NAME <- droplevels(REGION1$NAME)
levels(REGION1$NAME)
# [1] "Test" "Test3"
Notice that the REFERENCE
column also has levels that were carried over. You can remove the extra levels in both columns at once with
REGION1[-3] <- lapply(REGION1[-3], droplevels)
Now we can see that all extra levels are gone
lapply(REGION1[-3], levels)
# $NAME
# [1] "Test" "Test3"
#
# $REFERENCE
# [1] "REGION1"