This is a similar question to other posts, but I'm looking for a more automated solution than recode and solutions like that.
I have a column with many categories ie a city and would like to create a new column in R that assigns the city to a numeric value automatically like this:
City CityCode
New York 0
New York 0
Boston 1
Boston 1
Chicago 2
New Haven 3
I have about 1,000 cities so it doesn't make sense to encode individually.
data$CityCode = as.integer(factor(data$City))
will work, ordering the Cities alphabetically by default. To put them in the order they occur in your data, data$CityCode = as.integer(factor(data$City, levels = unique(data$City)))
.
There are very few modeling applications for which this is a good idea. (I'm having trouble thinking of any...) Make sure you know what you're doing.