I have a quanteda
dictionary I want to randomly split into n
parts.
dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))
I have tried using the split
function like this: split(dict, f=factor(3))
but was not successful.
I would like to get three dictionaries back but I get
$`3`
Dictionary object with 2 key entries.
- [positive]:
- good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
- bad, worst, awful, atrocious, deplorable, horrendous
EDIT
I have included a different entry containing *
in the dictionary. The solution suggested by Ken Benoit throws an error in this case but works perfectly fine otherwise.
The desired output is something like this:
> dict_1
Dictionary object with 2 key entries.
- [positive]:
- good, wonderf*
- [negative]:
- deplorable, horrendous
> dict_2
Dictionary object with 2 key entries.
- [positive]:
- amazing, best
- [negative]:
- bad, worst
> dict_3
Dictionary object with 2 key entries.
- [positive]:
- outstanding, beautiful
- [negative]:
- awful, atrocious
In case the number of entries cannot be divided by n
without remainders, I have no specification but ideally I would be able to decide that I want (i) the 'remainder' separately or (ii) that I want all values to be distributed (which results in some splits being slightly larger).
There is a lot unspecified in the question since with dictionary keys of different lengths it's unclear how this should be handled, and since there is no pattern to the pairs in your expected answer.
Here, I've assumed you have keys of equal length, divisible by the split without a remainder, and that you want to split it in running, adjacent intervals for each dictionary key.
This should do it.
library("quanteda")
## Package version: 1.5.1
dict <- dictionary(
list(
positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
)
)
dictionary_split <- function(x, len) {
maxlen <- max(lengths(x)) # change to minumum to avoid recycling
subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
names(splitlist) <- paste0("dict_", seq_along(splitlist))
lapply(splitlist, dictionary)
}
dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
## - good, amazing
## - [negative]:
## - bad, worst
##
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
## - best, outstanding
## - [negative]:
## - awful, atrocious
##
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
## - beautiful, delightful
## - [negative]:
## - deplorable, horrendous