I want to create a new column with new variables (preferably letters) to count the frequency of each set later on.
Lets say I have a data frame called datatemp
which is like:
datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6) colors val 1 red 1 2 blue 2 3 red 3 4 blue 4 5 red 5 6 blue 6 7 red 1 8 blue 2 9 red 3 10 blue 4 11 red 5 12 blue 6
And I can see my unique row sets where colors
and val
columns have identical inputs together, such as:
unique(datatemp[c("colors","val")]) colors val 1 red 1 2 blue 2 3 red 3 4 blue 4 5 red 5 6 blue 6
What I really want to do is to create a new column in the same data frame where each unique set of row above has a level, such as:
colors val freq 1 red 1 A 2 blue 2 B 3 red 3 C 4 blue 4 D 5 red 5 E 6 blue 6 F 7 red 1 A 8 blue 2 B 9 red 3 C 10 blue 4 D 11 red 5 E 12 blue 6 F
I know that's very basic, however, I couldn't come up with an useful idea for a huge dataset.
So make the question more clear, I am giving another representation of desired output below:
colA colB newcol 10 11 A 12 15 B 10 11 A 13 15 C
Values in the new column should be based on uniqueness of first two columns before it.
www's solution maps the unique values in your value
column to letters in freq
column. If you want to do create a factor variable for each unique combination of colors
and val
, you could do something along these lines:
library(plyr)
datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6)
datatemp$freq <- factor(paste(datatemp$colors, datatemp$val), levels=unique(paste(datatemp$colors, datatemp$val)))
datatemp$freq <- mapvalues(datatemp$freq, from = levels(datatemp$freq), to = LETTERS[1:length(levels(datatemp$freq))])
I first create a new factor variable for each unique combination of val
and colors
, and then use plyr::mapvalues to rename the factor levels to letters.