The code snippet below converts a pair of vectors to a data frame, filling in along the way one column to indicate the provenance ("State") and another to indicate the type ("Ingredient").
overflow <- setdiff(c(21, 23, 27), c(21, 23))
underflow <- setdiff(c(11, 13, 17), c(17))
dfo <- data.frame("State"="over", Value=overflow)
dfu <- data.frame("State"="under", Value=underflow)
df <- rbind(dfo, dfu)
df$Ingredient <- "Beans"
With the given data all is well. We get the following dataframe.
> df
State Value Ingredient
1 over 27 Beans
2 under 11 Beans
3 under 13 Beans
But this is not good enough for the boundary case when setdiff
produces an empty vector (e.g.: underflow <- setdiff(c(11, 13, 17), c(11, 13, 17))
.
How can I build a dataframe from a vector while handling the case of an empty vector? The option of carrying around a "data frame is empty" flag would be a bad one since the code would become peppered with if
statements.
Update
In lieu of a comment to @AndS.'s suggestion:
Replacing data.frame
with dplyr::data_frame
works well. Initially at least. But inserting a column remains problematic. If both overflow
and underflow
are empty lists, df$Ingredient <- "Beans"
fails.
Using dplyr::data_frame
is probably the best option, but here's a base R approach just for fun
flow <- list(over = setdiff(c(21, 23, 27), c(21, 23)),
under = setdiff(c(11, 13, 17), c(17)))
flow.df <- Map(function(State, x)
if(length(x)) data.frame(State, x, Ingredient = 'Beans')
, names(flow)
, flow)
df <- do.call(rbind, flow.df)
df
# State x Ingredient
# over over 27 Beans
# under.1 under 11 Beans
# under.2 under 13 Beans
When one of them is empty:
flow <- list(over = setdiff(c(21, 23, 21), c(21, 23)),
under = setdiff(c(11, 13, 17), c(17)))
flow.df <- Map(function(State, x)
if(length(x)) data.frame(State, x, Ingredient = 'Beans')
, names(flow)
, flow)
df <- do.call(rbind, flow.df)
df
# State x Ingredient
# under.1 under 11 Beans
# under.2 under 13 Beans
Using dplyr::data_frame
and dplyr::mutate
as suggested by @AndS. lets you avoid the if
statement:
library(dplyr)
flow <- list(over = setdiff(c(21, 23, 21), c(21, 23)),
under = setdiff(c(11, 13, 17), c(17)))
flow.df <- Map(function(State, x) data_frame(State, x)
, names(flow)
, flow)
df <- do.call(rbind, flow.df)
df %>% mutate(Ingredient = 'Beans')
# # A tibble: 2 x 3
# State x Ingredient
# * <chr> <dbl> <chr>
# 1 under 11.0 Beans
# 2 under 13.0 Beans
Another commenter, who has since deleted their comment, pointed out you can use rep
with times = length(x)
where x
is overflow
or underflow
flow <- list(over = setdiff(c(21, 23, 21), c(21, 23)),
under = setdiff(c(11, 13, 17), c(17)))
flow.df <- Map(function(State, x, len)
data.frame(State = rep(State, len)
, x
, Ingredient = rep('Beans', len))
, names(flow)
, flow
, lengths(flow))
df <- do.call(rbind, flow.df)
df
# State x Ingredient
# under.1 under 11 Beans
# under.2 under 13 Beans