Ok, I have a two column data.frame with a variable number of child
to a head
. (other 2 columns are reference)
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
head child UID logic
1 01001 01001 1 FALSE
2 01001 01021 2 TRUE
3 01001 01047 3 TRUE
4 01001 01051 4 TRUE
5 01001 01085 5 TRUE
6 01001 01101 6 TRUE
7 01003 01003 7 FALSE
8 01003 01025 8 TRUE
9 01003 01053 9 TRUE
10 01003 01097 10 TRUE
11 01003 01099 11 TRUE
12 01003 01129 12 TRUE
13 01003 12033 13 TRUE
14 01005 01005 14 FALSE
15 01005 01011 15 TRUE
16 01005 01045 16 TRUE
17 01005 01067 17 TRUE
18 01005 01109 18 TRUE
19 01005 01113 19 TRUE
20 01005 13061 20 TRUE
21 01005 13239 21 TRUE
22 01005 13259 22 TRUE")
I would like to have only three rows for the unique head
and a list for the child
.
If you have a suggestion of a better way to do this, I am open to it.
The other columns UID
and logic
I have added for reference, but they can be dropped.
In my attempts, I have tried to convert to a graph with an edgelist, then to JSON.
# make graph ##########
library(tidyverse)
library(igraph)
library(jsonlite)
gdat <- select(dat, head, child)
mdat <- as.matrix(gdat)
edge_dat <- graph_from_edgelist(mdat)
plot.igraph(edge_dat)
jdat <- toJSON(mdat, matrix = "rowmajor")
Desired output:
head child1 child2 child3 child4 child5 child6 child7
01001 01001 01021 01047 01051 01085 01101 NA
01003 01003 01025 01053 ... and so on
01005 01005 01011 ... and so on
Is it what you want ?
setDT(dat)
dat_child <- dat[(logic)]
dat_child[,.(list(unique(child))), by = "head"]
dat_child
head V1
1: 1001 1021,1047,1051,1085,1101
2: 1003 1025, 1053, 1097, 1099, 1129,12033
3: 1005 1011, 1045, 1067, 1109, 1113,13061,...