The task is to sort abbreviated US states names in accordance with English alphabet. But I noticed, that R sorts lists basing on some kind of operating system language or regional settings. E.g., in my language (Lithuanian) even the order of Latin (non-Lithuanian) letters differs from the order in the English alphabet. Compare order of non-Lithuanian letters only in both alphabets:
"ABCDEFGHI Y JKLMNOPRSTUVZ"
sort(LETTERS)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "Y" "J" "K" "L" "M" "N"
[16] "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Z"
vs.
"ABCDEFGHIJKLMNOPQRSTUVWX Y Z"
LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
[16] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
So order of sorted abbreviations of the states also differ (notice the last 2, they should be "WV" and then "WY"):
sort(state.abb)
[1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "IA"
[13] "ID" "IL" "IN" "KY" "KS" "LA" "MA" "MD" "ME" "MI" "MN" "MO"
[25] "MS" "MT" "NC" "ND" "NE" "NH" "NY" "NJ" "NM" "NV" "OH" "OK"
[37] "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VA" "VT" "WA" "WI"
[49] "WY" "WV"
I tried Sys.setlocale("LC_TIME","English_United States.1252")
. It helped to get English names of weekdays in plots, graphs and figures.
Now I need help to sort correctly in "English" way.
If you have advice, where R behaves language-dependently and how to deal with that, please list it.
LC_TIME
controls date/time related language collation. For your purposes, LC_ALL
should do the trick:
Sys.setenv('LC_ALL', 'English_United States.1252')
sort(letters)
However, beware that these settings are operating system specific. The above would for instance not work on a typical Unix system. Instead, the string 'en_US.UTF-8'
is generally a good setting — but under Windows, that itself may pose problems as R’s Unicode support is sketchy on Windows.