Search code examples
rdataframer-factor

Create a new level column based on unique row sets


I want to create a new column with new variables (preferably letters) to count the frequency of each set later on.

Lets say I have a data frame called datatemp which is like:

 datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6)
    colors val
1     red   1
2    blue   2
3     red   3
4    blue   4
5     red   5
6    blue   6
7     red   1
8    blue   2
9     red   3
10   blue   4
11    red   5
12   blue   6

And I can see my unique row sets where colors and val columns have identical inputs together, such as:

 unique(datatemp[c("colors","val")]) 
   colors val
1    red   1
2   blue   2
3    red   3
4   blue   4
5    red   5
6   blue   6

What I really want to do is to create a new column in the same data frame where each unique set of row above has a level, such as:

    colors val freq
1     red   1   A
2    blue   2   B
3     red   3   C
4    blue   4   D
5     red   5   E
6    blue   6   F
7     red   1   A
8    blue   2   B
9     red   3   C
10   blue   4   D
11    red   5   E
12   blue   6   F

I know that's very basic, however, I couldn't come up with an useful idea for a huge dataset.

So make the question more clear, I am giving another representation of desired output below:

   colA     colB  newcol
    10        11     A
    12        15     B
    10        11     A
    13        15     C

Values in the new column should be based on uniqueness of first two columns before it.


Solution

  • www's solution maps the unique values in your value column to letters in freq column. If you want to do create a factor variable for each unique combination of colors and val, you could do something along these lines:

    library(plyr)
    datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6)
    datatemp$freq <- factor(paste(datatemp$colors, datatemp$val), levels=unique(paste(datatemp$colors, datatemp$val)))
    datatemp$freq <- mapvalues(datatemp$freq, from = levels(datatemp$freq), to = LETTERS[1:length(levels(datatemp$freq))])
    

    I first create a new factor variable for each unique combination of val and colors, and then use plyr::mapvalues to rename the factor levels to letters.