Search code examples
runique

Trouble finding unique values


I have a problem with unique command. When I run unique for each variable I have no problems , but when I try to run for more than one variable I get trouble.

This is my database:

structure(list(M1 = c("AMZN PE Equity", "ANDAHUC1 PE", "ANDAHUC1 PE", 
"ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity"), M2 = c("AMZN PE Equity", 
"ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity")), row.names = c(NA, 13L), class = "data.frame")

Both variables( M1, M2) have the same characters but are repeated in different proportions( ALICPRLl1 Pe Equity is repeated six times in M1 , but is repeated seven times in M2).

This is my code for more than one variable:

morethan1<- unique(testrep[c("M1", "M2")])

and this is my output:

      M1                       M2
1      AMZN PE Equity        AMZN PE Equity
2        ANDAHUC1 PE        ANDAHUC1 PE
7         ANDAHUC1 PE       ALICPRLl1 Pe Equity
8 ALICPRLl1 Pe Equity       ALICPRLl1 Pe Equity

As you can see there are some repeated names yet.

This is my code for just one variable:

just1 <- unique(testrep[c("M1")])

I have no problem in this case:

         M1
1      AMZN PE Equity
2         ANDAHUC1 PE
8 ALICPRLl1 Pe Equity

Also I have the numbers (1 , 2 , 8), I want numbers to appear in order(1 ,2 3)

Finally, If I have 100 columns( M1 to M100) how can I modify my first code(M1:M00)?

Another way:

bothvariables <- subset(unique(testrep), M1 != M2)

This is my output:

         M1                           M2
7 ANDAHUC1 PE               ALICPRLl1 Pe Equity

But I want this output:

         M1                            M2
      AMZN PE Equity             AMZN PE Equity
         ANDAHUC1 PE             ANDAHUC1 PE
 ALICPRLl1 Pe Equity             ALICPRLl1 Pe Equity

No matter if the names are repeated different number of times in M1, M2 I want an output that show me the unique names inside M1 and M2 columns.

Now I try with a one more column : M3

This is my new database:

structure(list(M1 = c("AMZN PE Equity", "ANDAHUC1 PE", "ANDAHUC1 PE", 
"ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity"), M2 = c("AMZN PE Equity", 
"ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity"), M3 = c("AMZN PE Equity", "AMZN PE Equity", 
"AMZN PE Equity", "ANDAHUC1 PE", "ANDAHUC1 PE", "ANDAHUC1 PE", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", "ALICPRLl1 Pe Equity", 
"ALICPRLl1 Pe Equity")), row.names = c(NA, 13L), class = "data.frame")

This is the code:

testrep %>%
    distinct(M1, .keep_all = TRUE)

This is the output:

                   M1                  M2                  M3
1      AMZN PE Equity      AMZN PE Equity      AMZN PE Equity
2         ANDAHUC1 PE         ANDAHUC1 PE      AMZN PE Equity
3 ALICPRLl1 Pe Equity ALICPRLl1 Pe Equity ALICPRLl1 Pe Equity

I have the same problem now, just because I added a new column.

this is my expected output:

               M1                M2                  M3
1      AMZN PE Equity      AMZN PE Equity      AMZN PE Equity
2         ANDAHUC1 PE         ANDAHUC1 PE      ANDAHUC1 PE  
3 ALICPRLl1 Pe Equity   ALICPRLl1 Pe Equity    ALICPRLl1 Pe Equity

Solution

  • We can use paste

     testrep[paste0("M", 1:100)]
    

    Or with startsWith

    testrep[startsWith(names(testrep), "M")]
    

    Or using grep

    testrep[grep("^M\\d+$", names(testrep))]
    

    Regarding the row number issue, it can be set to NULL to get the default numbering

    row.names(just1) <- NULL
    

    We can use distinct with .keep_all

    library(dplyr)
    testrep %>%
        distinct(M1, .keep_all = TRUE)
    #                   M1                  M2
    #1      AMZN PE Equity      AMZN PE Equity
    #2         ANDAHUC1 PE         ANDAHUC1 PE
    #3 ALICPRLl1 Pe Equity ALICPRLl1 Pe Equity
    

    We could get the unique from. each column separately and then unnest it

    library(tidyr)
    testrep2 %>%
         summarise_all(list(~ list(unique(.)))) %>%
         unnest(everything())
    # A tibble: 3 x 3
    #  M1                  M2                  M3                 
    #  <chr>               <chr>               <chr>              
    #1 AMZN PE Equity      AMZN PE Equity      AMZN PE Equity     
    #2 ANDAHUC1 PE         ANDAHUC1 PE         ANDAHUC1 PE        
    #3 ALICPRLl1 Pe Equity ALICPRLl1 Pe Equity ALICPRLl1 Pe Equity