in my dataset are IDs with more than only one distinct name. To detect them I buil this function:
ddply(my_dataframe, ~ID_col, summarise, number_of_names = length(unique(names_col)))
That works just fine, so I get a table with the ID in the first col and it's number of distinct names in the second.
Because I need to do this to several ID/name-pairs I decited to put the ddply-function in a function. I did it as follows:
function_name = function (source, id, name) {
ddply(source, ~id, summarise, number_of_names = length(unique(name)))
Unfortunately, this throws an error when I use it:
function_name(my_dataframe, ID_col, names_col)
# Error in unique.default(x) : unique() applies only to vectors
As you can see, it is the exact same code like before but embedded in a function with three variables. I am desperate about fixing it and really looking forward to a solution.
FYI: In my original code I did not use "source" or "name" but German words, so there should be no problems regarding existing other functions. I also already tried to put the variables in quotes.
Thanks for any help!
This is how the DF kinda looks like:
my_dataframe <- data.frame(
ID_col = c(letters[2:9], letters[3:4]),
names_col = paste0("name-", letters[1:10])
)
There are 303 IDs but 963 names.
R has always had the functionality of selecting columns by the value of a variable name by using double square brackets. Using tapply
you can do it this way:
function_name = function (source, id, name) {
data.frame(
N=tapply(
source[[name]],
my_dataframe[[id]],
function(x){
length(unique(x))
}
)
)
}
Then:
> function_name(my_dataframe,"ID_col","names_col")
N
FU181 2
FU901 1
FU992 1
Note the names are in the row names of the returned data frame.