This is my transaction data:
data:
id from_id to_id amount date_trx
<fctr> <fctr> <fctr> <dbl> <date>
0 7468 5695 700.0 2005-01-04
1 6213 9379 11832.0 2005-01-08
2 7517 8170 1000.0 2005-01-10
3 6143 9845 4276.0 2005-01-12
4 6254 9640 200.0 2005-01-14
5 6669 5815 200.0 2005-01-20
6 6934 8583 49752.0 2005-01-24
7 9240 8314 19961.0 2005-01-26
8 6374 8865 1000.0 2005-01-30
9 6143 6530 13.4 2005-01-31
...
I formed the network where the edges are formed between the nodes(accounts) from_id
's and to_id
's, and the weights of the edges determined by the amounts they transact. Then I calculated the network's measures such as degree centrality, betweenness centrality, closeness centrality etc.
That is:
relations <- data.frame(from = data$from_id,
to = data$to_id)
network <- graph_from_data_frame(relations, directed = T)
E(network)$weight <- data$amount
V(network)$degree <- degree(network, normalized=TRUE)
V(network)$betweenness <- betweenness(network, normalized=TRUE)
V(network)$closeness <- closeness(network, normalized=TRUE)
But now I want to calculate these measures periodically. For example, I want to divide my data by weeks(starting from the very first transaction date) and calculate the network measures for each account for corresponding weeks.
data$week <- unsplit(tapply(data$date_trx, data$from_id, function(x) (as.numeric(x-min(data$trx_date)) %/% 7)+1),data$from_id)
select(data, from_id, to_id, date_trx, week, amount) %>% arrange(date_trx)
from_id to_id date_trx week amount
<fctr> <fctr> <date> <dbl> <dbl>
6644 6934 2005-01-01 1 700
6753 8456 2005-01-01 1 600
9242 9333 2005-01-01 1 1000
9843 9115 2005-01-01 1 900
7075 6510 2005-01-02 1 400
8685 7207 2005-01-02 1 1100
... ... ... ... ...
9866 6697 2010-12-31 313 95.8
9866 5992 2010-12-31 313 139.1
9866 5797 2010-12-31 313 72.1
9866 9736 2010-12-31 313 278.9
9868 8644 2010-12-31 313 242.8
9869 8399 2010-12-31 313 372.2
As I divided my data into weekly periods, now I need to form networks of accounts for each week separately and so that I can calculate network measures for accounts in weekly periods. How can I do that for 313 weeks and at once?
One possibility is splitting your data according to week, transform each week into an igraph object and then add the centralities and degree to all graphs at once, using lapply. My initial data.frame is named d (see below):
library(igraph)
head(d)
from_id to_id weight date_trx
1 D I 8 1999-09-12
2 E H 10 1999-10-20
3 A G 10 1999-09-10
4 C G 13 1999-04-15
5 E J 9 1999-06-26
6 B F 15 1999-04-30
First get the week:
d$week <- strftime(d$date_trx, format = "%V")
Now split by week:
dd <- split(d, d$week )
Transform each week into an igraph
dd <- lapply(dd, function(x) graph_from_data_frame(x, directed = T))
Write a function that does all the operations you want to carry out, and then apply it to each graph:
my.funct <- function(x) {
V(x)$degree <- degree(x, normalized=TRUE)
V(x)$betweenness <- betweenness(x, normalized=TRUE)
V(x)$closeness <- closeness(x, normalized=TRUE)
return(x)
}
dd <- lapply(dd, my.funct)
For instance, for the first week:
dd[[1]]
IGRAPH f515e52 DN-- 4 2 --
+ attr: name (v/c), degree (v/n), betweenness (v/n), closeness (v/n), weigth (e/n), date_trx
| (e/n), week (e/c)
+ edges from f515e52 (vertex names):
[1] B->F C->G
get.vertex.attribute(dd[[1]])
$name
[1] "B" "C" "F" "G"
$degree
[1] 0.3333333 0.3333333 0.3333333 0.3333333
$betweenness
[1] 0 0 0 0
$closeness
[1] 0.3333333 0.3333333 0.2500000 0.2500000
get.edge.attribute(dd[[1]])
$weight
[1] 9 7
$date_trx
[1] 10595 10601
$week
[1] "01" "01"
You can then retrieve all centralities and degree for all weeks:
ddd <- lapply(dd, function(x) igraph::as_data_frame(x, what = "vertices") )
# keep in mind that `split` names the objects in the list according to
# the value it used to split, therefore the name of the data.frames in
# the list is the name of the week.
library(dplyr)
ddd <- bind_rows(ddd, .id="week")
head(ddd)
week name degree betweenness closeness
1 01 E 1.4444444 0 0.2000000
2 01 D 1.5555556 0 0.1666667
3 01 B 0.7777778 0 0.2000000
4 01 A 1.0000000 0 0.2000000
5 01 C 0.7777778 0 0.1666667
6 01 F 1.0000000 0 0.1000000
In case, you can use this to merge back to the original edges list.
Data used in this example:
set.seed(123)
d <- data.frame(from_id = sample(LETTERS[1:5], 2000, replace = T),
to_id = sample(LETTERS[6:10], 2000, replace = T),
weight = rpois(2000, 10),
date_trx = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 2000, replace = T))