Search code examples
rtidyverseskimr

What does the n_unique value mean, when skimming list variables?


I do not understand the meaning of the n_unique value in skim about list variables:

library(tidyverse)
library(skimr)

skim(starwars)

The following is part of the result, about the three list variables in the dataset:

detail of the result of skim(starwars)

Now, there are 10 different vehicles in the dataset, so it makes sense that n_unique is 11 (including the null case of Star Wars characters not using any vehicle). Characters can use from a min of zero vehicles (min_length) to a max of two different vehicles (max_length) all along the movies. There are also 16 starships, and characters can use from zero to five different starships, so all makes sense.

However, there are only seven movies. So, n_unique should be 7 and not 24. Also, it is true that a character can make an appearance in a minimum of one movie (min_length) to a max of all the seven movies (max_length).


Solution

  • There are 7 values for individual films, but there are 24 unique elements if you compare the list elements between themselves.

    For example if the first element is [The Phantom Menace, Revenge of the Sith] and the second element is [The Phantom Menace] then the two elements are different.

    library(tidyverse)
    library(skimr)
    
    # Count unique individual films
    starwars$films |> 
      unlist() |> 
      unique() |> 
      length()
    #> [1] 7
    
    # Count unique list elements
    starwars$films |> 
      unique() |> 
      length()
    #> [1] 24