Account | Tier 1 | Tier 2 | Tier 3 |
---|---|---|---|
Account 1 | 1 | 21 | 55 |
Account 2 | 1 | 21 | 55 |
Account 3 | 2 | 52 | 72 |
Account 4 | 2 | 52 | 66 |
Account 5 | 3 | 52 | 66 |
Account 6 | 3 | 52 | 99 |
Account 7 | 4 | 63 | 43 |
Account 8 | 4 | 63 | 49 |
Account 9 | 5 | 36 | 94 |
Account 10 | 5 | 42 | 55 |
How do I group the accounts based on all 3 tiers according to the following?
If Account A & B are in the same group in Tier 1 and if Account 1, 2 & 10 are in the same group in Tier 3, then Account 1, 2, 9 & 10 need to be classified in the same final group. Account 9 was included because it groups with Account 10 in Tier 1.
Desired output:
Account | Tier 1 | Tier 2 | Tier 3 | Final Group |
---|---|---|---|---|
Account 1 | 1 | 21 | 55 | Group 1 |
Account 2 | 1 | 21 | 55 | Group 1 |
Account 3 | 2 | 52 | 72 | Group 2 |
Account 4 | 2 | 52 | 66 | Group 2 |
Account 5 | 3 | 52 | 66 | Group 2 |
Account 6 | 3 | 52 | 99 | Group 2 |
Account 7 | 4 | 63 | 43 | Group 3 |
Account 8 | 4 | 63 | 49 | Group 3 |
Account 9 | 5 | 36 | 94 | Group 1 |
Account 10 | 5 | 42 | 55 | Group 1 |
This is a simplified version of a bigger problem which has 7+ tiers across 500k data points.
Another example where graph theory and the igraph package comes in handy, to find the connected components:
## make a two-column 'edgelist' showing the account<-->tier relationships
datlong <- cbind(dat[1], stack(dat[-1]))
library(igraph)
## make a graph object
g <- graph_from_data_frame(datlong[1:2])
## extract the connected components
cmp <- components(g)
## assign the membership of the accounts based on the components
dat$final_group <- cmp$membership[match(dat$Account, names(cmp$membership))]
dat
# Account Tier1 Tier2 Tier3 final_group
#1 Account1 1 21 55 1
#2 Account2 1 21 55 1
#3 Account3 2 52 72 2
#4 Account4 2 52 66 2
#5 Account5 3 52 66 2
#6 Account6 3 52 99 2
#7 Account7 4 63 43 3
#8 Account8 4 63 49 3
#9 Account9 5 36 94 1
#10 Account10 5 42 55 1
Where dat
was:
dat <- read.table(text="Account Tier1 Tier2 Tier3
Account1 1 21 55
Account2 1 21 55
Account3 2 52 72
Account4 2 52 66
Account5 3 52 66
Account6 3 52 99
Account7 4 63 43
Account8 4 63 49
Account9 5 36 94
Account10 5 42 55", header=TRUE)