Search code examples
runiqueplyrdistinct-values

Function does not work embedded in another function


in my dataset are IDs with more than only one distinct name. To detect them I buil this function:

ddply(my_dataframe, ~ID_col, summarise, number_of_names = length(unique(names_col)))

That works just fine, so I get a table with the ID in the first col and it's number of distinct names in the second.

Because I need to do this to several ID/name-pairs I decited to put the ddply-function in a function. I did it as follows:

function_name = function (source, id, name) {
  ddply(source, ~id, summarise, number_of_names = length(unique(name)))

Unfortunately, this throws an error when I use it:

function_name(my_dataframe, ID_col, names_col)
# Error in unique.default(x) : unique() applies only to vectors

As you can see, it is the exact same code like before but embedded in a function with three variables. I am desperate about fixing it and really looking forward to a solution.

FYI: In my original code I did not use "source" or "name" but German words, so there should be no problems regarding existing other functions. I also already tried to put the variables in quotes.

Thanks for any help!

This is how the DF kinda looks like:

my_dataframe <- data.frame(
  ID_col = c(letters[2:9], letters[3:4]),
  names_col = paste0("name-", letters[1:10])
)

There are 303 IDs but 963 names.


Solution

  • R has always had the functionality of selecting columns by the value of a variable name by using double square brackets. Using tapply you can do it this way:

    function_name = function (source, id, name) {
        data.frame(
           N=tapply(
               source[[name]],
               my_dataframe[[id]],
               function(x){
                 length(unique(x))
                 }
              )
            )
      }
    

    Then:

    > function_name(my_dataframe,"ID_col","names_col")
          N
    FU181 2
    FU901 1
    FU992 1
    

    Note the names are in the row names of the returned data frame.