I'm trying to generate the following YAML structure from tabular data:
- name: Josiah Carberry
roles:
- investigation: lead
- data curation: supporting
I'm struggling with the structure of the roles
key. It's basically an array of dictionaries which would translate to a list of data frames in R.
My issue is that I can't figure out how to store such lists of data frames in a way that will produce the same output as in the example above.
Here's my attempt:
library(tibble)
tibble(
id = paste0("id", 1:2),
roles = list(
list(tibble(writing = "lead"), tibble(supervision = "supporting")),
list(tibble(writing = "equal"))
)
) |>
jsonlite::toJSON() |>
jsonlite::parse_json() |>
yaml::as.yaml(indent.mapping.sequence = TRUE) |>
cat()
Which produces:
- id: id1
roles:
- - writing: lead
- - supervision: supporting
- id: id2
roles:
- - writing: equal
As you can see there's one extra dash before each role because of the outer list I use to store the data frames.
Any idea how I could get the following?
- id: id1
roles:
- writing: lead
- supervision: supporting
- id: id2
roles:
- writing: equal
I don't know a way for yaml::as.yaml
to do it correctly the first time, but you can always do a simple gsub
to change all - -
manually:
tibble(
id = paste0("id", 1:2),
roles = list(
list(tibble(writing = "lead"), tibble(supervision = "supporting")),
list(tibble(writing = "equal"))
)
) |>
jsonlite::toJSON() |>
jsonlite::parse_json() |>
yaml::as.yaml(indent.mapping.sequence = TRUE) |>
gsub("- - ", "- ", x = _) |>
cat()
# - id: id1
# roles:
# - writing: lead
# - supervision: supporting
# - id: id2
# roles:
# - writing: equal
If there's a risk you could have a legitimate - -
embedded within one of your items, you can make the gsub
a bit more specific:
gsub("((^|\n) *)- - ", "\\1- ", x = _)
where
(^|\n)
matches either the string beginninf (^
) or the embedded newline; this is necessary because as.yaml
returns a single string with \n
embedded within it, so we cannot rely on ^
to catch them all; we could always use strsplit(_, "\n")[[1]]
to split it then gsub
then recombine, but that seems unnecessary given we can do it in one step((^|\n) *)
finds the above plus zero or more spaces; I think " "
is safe here instead of "\\s"
, since I believe as.yaml
is always going to put out spaces there; because this is wrapped in parens, it will be available as a pattern group for later recall"\\1- "
replaces everything in the pattern (including the - -
) with the parenthesized pattern group (previous bullet) and a single hyphen-space.tibble
to list
:Up front, this fixes the - -
problem with the above, though it converts inline dictionaries with nested. Formally, they resolve to the same underlying structure, so if you're okay with the added verbosity then perhaps this is better:
tibble(
id = paste0("id", 1:2),
roles = list(
list(tibble(writing = "lead"), tibble(supervision = "supporting")),
list(tibble(writing = "equal"))
)
) |>
transform(roles = rapply(roles, unlist, how = "list")) |>
jsonlite::toJSON() |>
jsonlite::parse_json() |>
yaml::as.yaml(indent.mapping.sequence = TRUE) |>
cat()
# - id: id1
# roles:
# - writing:
# - lead
# - supervision:
# - supporting
# - id: id2
# roles:
# - writing:
# - equal
This looks slightly different than your expected, though they are still valid yaml dictionaries. The effective result is confirmed with:
out1 <- tibble(
id = paste0("id", 1:2),
roles = list(
list(tibble(writing = "lead"), tibble(supervision = "supporting")),
list(tibble(writing = "equal"))
)
) |>
jsonlite::toJSON() |>
jsonlite::parse_json() |>
yaml::as.yaml(indent.mapping.sequence = TRUE) |>
gsub("- - ", "- ", x = _)
out2 <- tibble(
id = paste0("id", 1:2),
roles = list(
list(tibble(writing = "lead"), tibble(supervision = "supporting")),
list(tibble(writing = "equal"))
)
) |>
transform(roles = rapply(roles, unlist, how = "list")) |>
jsonlite::toJSON() |>
jsonlite::parse_json() |>
yaml::as.yaml(indent.mapping.sequence = TRUE)
out1
# [1] "- id: id1\n roles:\n - writing: lead\n - supervision: supporting\n- id: id2\n roles:\n - writing: equal\n"
out2
# [1] "- id: id1\n roles:\n - writing:\n - lead\n - supervision:\n - supporting\n- id: id2\n roles:\n - writing:\n - equal\n"
identical(yaml::read_yaml(text = out1), yaml::read_yaml(text = out2))
# [1] TRUE