Search code examples
rdplyrrecode

R way to automatically assign numeric value to categorical column for modeling


This is a similar question to other posts, but I'm looking for a more automated solution than recode and solutions like that.

I have a column with many categories ie a city and would like to create a new column in R that assigns the city to a numeric value automatically like this:

City    CityCode
New York  0
New York  0
Boston    1
Boston    1
Chicago   2
New Haven 3

I have about 1,000 cities so it doesn't make sense to encode individually.


Solution

  • data$CityCode = as.integer(factor(data$City)) will work, ordering the Cities alphabetically by default. To put them in the order they occur in your data, data$CityCode = as.integer(factor(data$City, levels = unique(data$City))).

    There are very few modeling applications for which this is a good idea. (I'm having trouble thinking of any...) Make sure you know what you're doing.