I am curious about the behaviour of transform
. Two ways I might try creating a new column as character not as factor:
x <- data.frame(Letters = LETTERS[1:3], Numbers = 1:3)
y <- transform(x, Alphanumeric = as.character(paste(Letters, Numbers)))
x$Alphanumeric = with(x, as.character(paste(Letters, Numbers)))
x
y
str(x$Alphanumeric)
str(y$Alphanumeric)
The results "look" the same:
> x
Letters Numbers Alphanumeric
1 A 1 A 1
2 B 2 B 2
3 C 3 C 3
> y
Letters Numbers Alphanumeric
1 A 1 A 1
2 B 2 B 2
3 C 3 C 3
But look inside and only one has worked:
> str(x$Alphanumeric) # did convert to character
chr [1:3] "A 1" "B 2" "C 3"
> str(y$Alphanumeric) # but transform didn't
Factor w/ 3 levels "A 1","B 2","C 3": 1 2 3
I didn't find ?transform
very useful to explain this behaviour - presumably Alphanumeric
was coerced back to being a factor - or find a way to stop it (something like stringsAsFactors = FALSE
for data.frame
). What is the safest way to do this? Are there similar pitfalls to beware of, for instance with the apply
or plyr
functions?
This is not so much an issue with transform
as much as it is with data.frame
s, where stringsAsFactors
is set, by default, to TRUE
. Add an argument that it should be FALSE
and you'll be on your way:
y <- transform(x, Alphanumeric = paste(Letters, Numbers),
stringsAsFactors = FALSE)
str(y)
# 'data.frame': 3 obs. of 3 variables:
# $ Letters : Factor w/ 3 levels "A","B","C": 1 2 3
# $ Numbers : int 1 2 3
# $ Alphanumeric: chr "A 1" "B 2" "C 3"
I generally use within
instead of transform
, and it seems to not have this problem:
y <- within(x, {
Alphanumeric = paste(Letters, Numbers)
})
str(y)
# 'data.frame': 3 obs. of 3 variables:
# $ Letters : Factor w/ 3 levels "A","B","C": 1 2 3
# $ Numbers : int 1 2 3
# $ Alphanumeric: chr "A 1" "B 2" "C 3"
This is because it takes an approach similar to your with
approach: Create a character vector and add it (via [<-
) into the existing data.frame
.
You can view the source of each of these by typing transform.data.frame
and within.data.frame
at the prompt.
As for other pitfalls, that's much too broad of a question. One thing that comes to mind right waya is that apply
would create a matrix
from a data.frame
, so all the columns would be coerced to a single type.