Search code examples
rdata-cleaning

R reassign values from a column depending on the frequency


I'm tryng to get the column "names" from my dataframe, and change the names with lesser frequency to "others" in order to simplify a later Java program. For example:

someValue   Names
1           Ramon
2           Alex
4           Ramon
1           Luke
2           Han
3           Leia
4           Luke
8           Ramon
20          Luke

Now, the names with less than 3 frequency have to become others:

someValue   Names
1           Ramon
2           Others
4           Ramon
1           Luke
2           Others
3           Others
4           Luke
8           Ramon
20          Luke

And I am a little lost with this, I hope anyone knows a quick way to do this, thanks in advance!


Solution

  • You can use the table function to calculate the frequencies, and then find the ones whose frequencies are too low.
    An example using character strings:

    set.seed(123)
    df <- data.frame(
        someValue = 1:50,
        Names = sample(LETTERS, 50, TRUE),
        stringsAsFactors = FALSE
    )
    n.tab <- table( df$Names )
    n.many <- names( n.tab[ n.tab > 3] )
    df[ !(df$Names %in% n.many), "Names"] <- "Others"
    df
    

    Or the same example, but with a factor:

    set.seed(123)
    df <- data.frame(
        someValue = 1:50,
        Names = sample(LETTERS, 50, TRUE)
    )
    n.tab <- table( df$Names )
    n.many <- names( n.tab[ n.tab > 3] )
    
    levels(df$Names)[ !(levels(df$Names) %in% n.many) ] <- "Others"
    df