I have a string vector in R:
c("apple pie {we have some text here}", "banana{something{something}}", "cherry {asd9asdjsaf}", "banana {monkey})
[1] "apple pie {we have some text here}" "banana {something{something}}"
[3] "cherry {asd9asdjsaf}" "banana {monkey}"
I would like to make this into a named string vector so that the FIRST opening curly bracket acts as a separator character between the name and the corresponding element, but it is also part of the element AND if there are duplicated names the contents under the same name would be joined with newline so that:
apple pie banana
"apple pie {we have some text here}" "{something{something}}\n{monkey}"
cherry
"cherry {asd9asdjsaf}"
This can be achieved using a regular expressions and iteration (such as sapply, loop etc.):
library(dplyr)
elemNames <- originalvector %>% gsub("\\{.*", "", .) #remove "{"-character and everything after it
elems <- originalvector %>% sub(".*?\\{", "{", .) #replace "{"-character and everything before it with just "{"-character
names(elems) <- elemNames
newvector <- sapply(unique(elemNames), \(elemName) {
elems[grep(elemName, names(elems))] %>% {paste(.,collapse = "\n")}
}) %>% setNames(unique(elemNames))
However, I was wondering whether there is a more elegant solutions (possibly a one-liner) to do this? My initial solution looks so ugly and complicated. :)
You can simplify this using tapply()
:
elemNames <- gsub("\\s?\\{.*", "", originalvector)
elems <- sub(".*?\\{", "{", originalvector)
tapply(elems, elemNames, paste, collapse='\n')
# apple pie banana
# "{we have some text here}" "{something{something}}\n{monkey}"
# cherry
# "{asd9asdjsaf}"
I slightly modified your first regular expression so that a space is removed after the element name when present.