Search code examples
rdataframedplyrpurrr

show unique values for each column


I am trying to create a data-frame of the column type and unique variables for each column.

I am able to get column type in the desired data-frame format using map(df, class) %>% bind_rows() %>% gather(key = col_name, value = col_class), but unable to get the unique variables to become a data-frame instead of a list.

Below is a small data-frame and code that gets the unique variables in a list, but not a data frame. Ideally, I could do this in one (map) function, but if I have to join them, it would not be a big deal.


df <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))

library(tidyverse)

map(df, class) %>% bind_rows() %>% gather(key = col_name, value = col_class)

map(df, unique)

When I try to use the same method on the map(df, unique) as on the map(df, class) I get the following error: Error: Argument 2 must be length 3, not 2 which is expected, but I am not sure how to get around it.


Solution

  • The number of unique values are different in those two columns. You need to reduce them to a single element.

    df2 <- map(df, ~str_c(unique(.x),collapse = ",")) %>% 
        bind_rows() %>% 
        gather(key = col_name, value = col_unique)
    
    > df2
    # A tibble: 2 x 2
      col_name col_class
      <chr>    <chr>    
    1 v1       1,2,3    
    2 v2       a,b