The dataframe I am working on is coded in dyadic format where each observation (i.e., row) contains a source node (from
) and a target node (to
) along with other some dyadic covariates (such as dyadic correlation, corr
).
For simplicity sake, I want to treat each dyad as un-ordered and generate a unique identifier for each dyad like the one (i.e., df1
) elow:
# original data
df <- data.frame(
from = c("A", "A", "A", "B", "C", "A", "D", "E", "F", "B"),
to = c("B", "C", "D", "C", "B", "B", "A", "A", "A", "A"),
corr = c(0.5, 0.7, 0.2, 0.15, 0.15, 0.5, 0.2, 0.45, 0.54, 0.5))
from to corr
1 A B 0.50
2 A C 0.70
3 A D 0.20
4 B C 0.15
5 C B 0.15
6 A B 0.50
7 D A 0.20
8 E A 0.45
9 F A 0.54
10 B A 0.50
# desired format
df1 <- data.frame(
from = c("A", "A", "A", "B", "C", "A", "D", "E", "F", "B"),
to = c("B", "C", "D", "C", "B", "B", "A", "A", "A", "A"),
corr = c(0.5, 0.7, 0.2, 0.15, 0.15, 0.5, 0.2, 0.45, 0.54, 0.5),
dyad = c(1, 2, 3, 4, 4, 1, 3, 5, 6, 1))
from to corr dyad
1 A B 0.50 1
2 A C 0.70 2
3 A D 0.20 3
4 B C 0.15 4
5 C B 0.15 4
6 A B 0.50 1
7 D A 0.20 3
8 E A 0.45 5
9 F A 0.54 6
10 B A 0.50 1
where dyad A-B/B-A, A-D/D-A are treated as identical pairs and are assigned with the same dyad identifiers. While it's easy to extract a list of un-ordered pairs from the original data, it's hard to map them onto the original dataframe to generate un-ordered dyad identifiers. Could anyone offer some insights on this?
One way using apply
could be to sort
and paste
the value in two column, convert them to factor
and then integer
to get a unique number for each combination.
df$temp <- apply(df[1:2], 1, function(x) paste(sort(x), collapse = "_"))
df$dyad <- as.integer(factor(df$temp, levels = unique(df$temp)))
df$temp <- NULL
df
# from to corr dyad
#1 A B 0.50 1
#2 A C 0.70 2
#3 A D 0.20 3
#4 B C 0.15 4
#5 C B 0.15 4
#6 A B 0.50 1
#7 D A 0.20 3
#8 E A 0.45 5
#9 F A 0.54 6
#10 B A 0.50 1