I have a very simple code, I do not understand why not working the way I want. Basically, I have a data frame and want to capture the value of n'th element of a column in the data frame, and store it in a vector. Here is my code:
COL1_VALUES <- c("ABC","XYZ","PQR")
COL2_VALUES <- c("DEF","JKL","TSM")
means <- data.frame(COL1_VALUES,COL2_VALUES)
for (i in 1:nrow(means)) {
COL1_VALUES[i] <- means$COL1[i];
COL2_VALUES[i] <- means$COL2[i];
}
print(means$COL1)
print(COL1_VALUES)
This outputs:
[1] ABC XYZ PQR
Levels: ABC PQR XYZ
[1] "1" "3" "2"
Why not am I not getting ABC XYZ TSM in the vector COL1_VALUES? It appears like 1, 3, 2 are the indices of ABC XYZ TSM in means$COL1. What do I need to get ABC XYZ TSM in the vector COL1_VALUES?
Thanks.
In R, data.frame()
function ships with a default setting of stringsAsFactors=TRUE
. This means that all input character vectors are implicitly converted into so called "factors" when creating a data.frame.
factor is somewhat like a vector with integers + a text labels that describe those integers. For example, if column gender
has a type factor
it is actually a vector of integers with 1
s and 2
s plus an attached dictionary that category id 1
means Male
and category id 2
means Female
or vice versa.
This default setting on stringsAsFactors
is a sneaky beast and can show up in numerous unexpected locations. In most of these cases, it helps just to add an explicit stringsAsFactors=FALSE
option so as to keep character vectors as character vectors.
Below I list the functions that I personally struggled with until realising that all I am missing is stringsAsFactors=FALSE
option:
data.frame
read.csv
, read.table
and other read.*
functionsexpand.grid
In your specific example above, what you need to do is find this line:
means <- data.frame(COL1_VALUES,COL2_VALUES)
and replace it with:
means <- data.frame(COL1_VALUES,COL2_VALUES,
stringsAsFactors=FALSE)
such that you are explicitly requesting data.frame()
not to do any implicit conversions behind your back.
You can also avoid this conversion by changing the global option at the beginning of each R session:
options(stringsAsFactors = FALSE)
Note, however, that modifying this global option only affects your machine and snippets of your code may stop working on the machines of others.
This answer contains more information about how to disable it permanently.