Here is a small data.frame
:
e = data.frame(A=c(letters[1:5], 1:5))
I am a little bit confused regarding what's happening when I execute the following command:
unclass(e$A) %>% as.numeric()
I am getting the following output:
[1] 6 7 8 9 10 1 2 3 4 5
why a:e
is treated as 6:10
?
Your question raises a problem that is burried deep in the heart of every computer language. It's the question of how to order characters.
The R help file ?sort
says this:
The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison. The sort order for factors is the order of their levels
So you can try to find your locale. You also want to check the ISO 14651 standard that defines international string ordering and comparison rules. Depending on your location, you might find out differences of ordering very specitic characters but regarding numbers, i guess they are always first.
"a">"1"
#### [1] TRUE
"a">"A"
#### [1] FALSE
Edit:
About the alphabetical precedence between upper case and lower case, it will indeed depend on your system locale (English follow us_en
and non-English locales can follow ASCII
or other see this wikipedia paragraph). Try this:
Sys.setlocale("LC_COLLATE", "C")
sort(c(1,2,3,"a", "b", "c", "A", "B", "C"))
#### [1] "1" "2" "3" "A" "B" "C" "a" "b" "c"
Sys.setlocale("LC_COLLATE", "French_France.1252")
sort(c(1,2,3,"a", "b", "c", "A", "B", "C"))
#### [1] "1" "2" "3" "a" "A" "b" "B" "c" "C"
Similar issues have actually been discussed in this other So question