I've been trying to figure out the behavior I am seeing below. Perhaps I am missing something obvious, but it has not yet dawned on me.
Consider the following
> a<-sample(c("serious","not serious"), 10,T)
> a
[1] "serious" "not serious" "not serious" "serious" "serious" "serious" "serious"
[8] "not serious" "not serious" "not serious"
> m<-toupper(unique(a))
> names(m)<-unique(a)
> m
serious not serious
"SERIOUS" "NOT SERIOUS"
> m[a]
serious not serious not serious serious serious serious serious
"SERIOUS" "NOT SERIOUS" "NOT SERIOUS" "SERIOUS" "SERIOUS" "SERIOUS" "SERIOUS"
not serious not serious not serious
"NOT SERIOUS" "NOT SERIOUS" "NOT SERIOUS"
> m[as.factor(a)] # notice the different order
not serious serious serious not serious not serious not serious not serious
"NOT SERIOUS" "SERIOUS" "SERIOUS" "NOT SERIOUS" "NOT SERIOUS" "NOT SERIOUS" "NOT SERIOUS"
serious serious serious
"SERIOUS" "SERIOUS" "SERIOUS"
The results returned by indexing by name are correct, but the order in which they return are different from a
.
I though perhaps when a
is a factor, it actually indexes by the underlying integer value, however then why would I be retrieving the appropriate values for each?
By the way, the behavior reverts to what I would expect if instead I have a factor with 3 levels.
a<-sample(c("serious","not serious","unknown"), 10,T)
......
> m[a]
not serious serious not serious not serious serious serious unknown
"NOT SERIOUS" "SERIOUS" "NOT SERIOUS" "NOT SERIOUS" "SERIOUS" "SERIOUS" "UNKNOWN"
unknown unknown serious
"UNKNOWN" "UNKNOWN" "SERIOUS"
> m[as.factor(a)]
not serious serious not serious not serious serious serious unknown
"NOT SERIOUS" "SERIOUS" "NOT SERIOUS" "NOT SERIOUS" "SERIOUS" "SERIOUS" "UNKNOWN"
unknown unknown serious
"UNKNOWN" "UNKNOWN" "SERIOUS"
When you index by factor, you are in fact indexing by the numeric values. What happens when you create a factor, is that R automatically sorts the levels so you see
> unique(a)
[1] "serious" "not serious"
> levels(as.factor(a))
[1] "not serious" "serious"
The two orders are flipped. That's why you get the opposite values for all the m[as.factor(a)]
values. (You just happen to have 5 of each, but if there were unbalanced results, you would notice that the values swap. You can force the levels of a factor to a specific order with
> levels(factor(a, levels=unique(a)))
[1] "serious" "not serious"
But anyway, what you're doing does seem a bit odd. If a
is a factor, you should convert it as early as possible.
(I got a different result than you for the three category example. The values were still switched for me. Perhaps the values from sample just happened to appear in alphabetical order that time.)