Search code examples
rdataframevectorempty-list

How to build a dataframe from an (empty) vector?


The code snippet below converts a pair of vectors to a data frame, filling in along the way one column to indicate the provenance ("State") and another to indicate the type ("Ingredient").

overflow  <- setdiff(c(21, 23, 27), c(21, 23))
underflow <- setdiff(c(11, 13, 17), c(17))

dfo <- data.frame("State"="over", Value=overflow)
dfu <- data.frame("State"="under", Value=underflow)
df <- rbind(dfo, dfu)

df$Ingredient <- "Beans"

With the given data all is well. We get the following dataframe.

> df
  State Value Ingredient
1  over    27      Beans
2 under    11      Beans
3 under    13      Beans

But this is not good enough for the boundary case when setdiff produces an empty vector (e.g.: underflow <- setdiff(c(11, 13, 17), c(11, 13, 17)).

How can I build a dataframe from a vector while handling the case of an empty vector? The option of carrying around a "data frame is empty" flag would be a bad one since the code would become peppered with if statements.

Update

In lieu of a comment to @AndS.'s suggestion:

Replacing data.frame with dplyr::data_frame works well. Initially at least. But inserting a column remains problematic. If both overflow and underflow are empty lists, df$Ingredient <- "Beans" fails.


Solution

  • Using dplyr::data_frame is probably the best option, but here's a base R approach just for fun

    flow <- list(over  = setdiff(c(21, 23, 27), c(21, 23)),
                 under = setdiff(c(11, 13, 17), c(17)))
    
    
    flow.df <- Map(function(State, x) 
                    if(length(x)) data.frame(State, x, Ingredient = 'Beans')
                   , names(flow)
                   , flow)
    
    df <- do.call(rbind, flow.df)
    
    df
    
    #         State  x Ingredient
    # over     over 27      Beans
    # under.1 under 11      Beans
    # under.2 under 13      Beans
    

    When one of them is empty:

    flow <- list(over  = setdiff(c(21, 23, 21), c(21, 23)),
                 under = setdiff(c(11, 13, 17), c(17)))
    
    
    flow.df <- Map(function(State, x) 
                    if(length(x)) data.frame(State, x, Ingredient = 'Beans')
                   , names(flow)
                   , flow)
    
    df <- do.call(rbind, flow.df)
    
    df
    
    #         State  x Ingredient
    # under.1 under 11      Beans
    # under.2 under 13      Beans
    

    Using dplyr::data_frame and dplyr::mutate as suggested by @AndS. lets you avoid the if statement:

    library(dplyr)
    
    flow <- list(over  = setdiff(c(21, 23, 21), c(21, 23)),
                 under = setdiff(c(11, 13, 17), c(17)))
    
    
    flow.df <- Map(function(State, x) data_frame(State, x)
                   , names(flow)
                   , flow)
    
    df <- do.call(rbind, flow.df)
    
    df %>% mutate(Ingredient = 'Beans')
    
    # # A tibble: 2 x 3
    #   State     x Ingredient
    # * <chr> <dbl> <chr>     
    # 1 under  11.0 Beans     
    # 2 under  13.0 Beans   
    

    Another commenter, who has since deleted their comment, pointed out you can use rep with times = length(x) where x is overflow or underflow

    flow <- list(over  = setdiff(c(21, 23, 21), c(21, 23)),
                 under = setdiff(c(11, 13, 17), c(17)))
    
    
    flow.df <- Map(function(State, x, len) 
                    data.frame(State = rep(State, len)
                               , x
                               , Ingredient = rep('Beans', len))
                   , names(flow)
                   , flow
                   , lengths(flow))
    
    df <- do.call(rbind, flow.df)
    
    df
    
    #         State  x Ingredient
    # under.1 under 11      Beans
    # under.2 under 13      Beans