Search code examples
rdataframeunique

Extracting unique values from data frame using R


I have a data frame with multiple columns and I want to be able to isolate two of the columns and get the total amount of unique values... here's an example of what I mean:

Lets say i have a data frame df:

df<- data.frame(v1 = c(1, 2, 3, 2, "a"), v2 = c("a", 2 ,"b","b", 4))
df

  v1 v2
1  1  a
2  2  2
3  3  b
4  2  b
5  a  4

Now what Im trying to do is extract just the unique values over the two columns. So if i just used unique() for each column the out put would look like this:

> unique(df[,1])
[1] 1 2 3 a
> unique(df[,2])
[1] a 2 b 4

But this is no good as it only finds the unique values per column, whereas I need the total amount of unique values over the two columns! For instance, 'a' is repeated in both columns, but I only want it counted once. For an example output of what I need; imagine the columns V1 and V2 are placed on top of each other like so:

  V1_V2
1      1
2      2
3      3
4      2
5      a
6      a
7      2
8      b
9      b
10     4

The unique values of V1_V2 would be:

   V1_V2
1      1
2      2
3      3
5      a
8      b
10     4

Then I could just count the rows using nrow(). Any ideas how I'd achieve this?


Solution

  • This is well suited for union:

    data.frame(V1_V2=union(df$v1, df$v2))
    
    #  V1_V2
    #1     1
    #2     2
    #3     3
    #4     a
    #5     b
    #6     4