I encountered a very weird problem when reading a .csv
file into a data frame newdata
using read.csv
.
One of the column is "Site", and it should be a string:
typeof(newdata$Site)
, I get the result "integer".table(newdata$Site)
, and I write this table to a .csv
file, I get a proper frequency table for each value, with additional numerical value (e.g. one column with no name with numerical values, one column named var1
with the sites strings (e.g. "www.google.com") and one column named Freq
with the frequency).I tried to create a new column which combines multiple values into one (e.g. "www.google.com" and "www.google.co.uk" into "Google") and I used grepl
, then I realized that R treats the original column not as a string...
When I tried to subset this column only by a = newdata[,"Site"]
, I got that
a
is of type factor... writing it to .csv
results in one long line of all the values....
What am I doing wrong???? I'm kind of new to these stuff and I really don't know what to do...
Thanks!!!
You have already dug quite a lot. You know that your column Site
is a factor and it has typeof()
integer.
To avoid coding strings as factors when reading in data, use:
read.csv(..., stringsAsFactors = FALSE)
Factors are stored as integers, where integer gives the position of its levels. Try:
x <- gl(3,2,labels=letters[1:3])
#[1] a a b b c c
#Levels: a b c
typeof(x)
#[1] "integer"
levels(x)
#[1] "a" "b" "c"
levels(x)[x] ## equivalent to "as.character(x)", but more efficient
#[1] "a" "a" "b" "b" "c" "c"