I'm tryng to get the column "names" from my dataframe, and change the names with lesser frequency to "others" in order to simplify a later Java program. For example:
someValue Names
1 Ramon
2 Alex
4 Ramon
1 Luke
2 Han
3 Leia
4 Luke
8 Ramon
20 Luke
Now, the names with less than 3 frequency have to become others:
someValue Names
1 Ramon
2 Others
4 Ramon
1 Luke
2 Others
3 Others
4 Luke
8 Ramon
20 Luke
And I am a little lost with this, I hope anyone knows a quick way to do this, thanks in advance!
You can use the table
function to calculate the frequencies, and then find the ones whose frequencies are too low.
An example using character strings:
set.seed(123)
df <- data.frame(
someValue = 1:50,
Names = sample(LETTERS, 50, TRUE),
stringsAsFactors = FALSE
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )
df[ !(df$Names %in% n.many), "Names"] <- "Others"
df
Or the same example, but with a factor:
set.seed(123)
df <- data.frame(
someValue = 1:50,
Names = sample(LETTERS, 50, TRUE)
)
n.tab <- table( df$Names )
n.many <- names( n.tab[ n.tab > 3] )
levels(df$Names)[ !(levels(df$Names) %in% n.many) ] <- "Others"
df