Sorry if this question has been asked, I played with my toy data to learn to manipulate data.tables. My goal was from this data:
toy_data = data.table(from=c("A","A","A","C","E","E","A","A","A","C","E","E"),
to=c("B","C","A","D","F","E","E","A","A","A","C",NA))
to arrive at this result:
final_matrix
L A B C D E F
1: A 3 1 2 <NA> 1 <NA>
2: B 1 0 <NA> <NA> <NA> <NA>
3: C 2 <NA> 0 1 1 <NA>
4: D <NA> <NA> 1 0 <NA> <NA>
5: E 1 <NA> 1 <NA> 1 1
6: F <NA> <NA> <NA> <NA> 1 0
7: tot 7 1 4 1 4 1
(eventually also with zeros instead of NAs, but got bored). I suppose in STATA this would be an easy cross-tab, I have built a function then looped over the unique values in the cols (sigh :/) merged the tables and then added a final line with the totals. Now although I've learned a lot, I wonder what would the clean R way to obtain such cross-tabs be? since the following doesn't work:
table(toy_data$from,toy_data$to)
A B C D E F
A 3 1 1 0 1 0
C 1 0 0 1 0 0
E 0 0 1 0 1 1
Thanks. My function if you have general improvements or best practices I am super happy:
create_edge_cols<- function(dt,column){
#this function takes a df and a column,
#computes the number of edges among this column and all the other in dt
#returns a column (list) with the cross-tabulation of columns
tot_edges_i = dim(dt[from==column|to==column][,.(to=na.omit(to))])[1] # E better! without NAs
print(tot_edges_i)
# now tabulate links of column
tab = data.table(table(unlist(dt[(from==column&to!=column)|
(from!=column&to==column)])))
setnames(tab, "V1", "L")
setnames(tab, "N", column)
setorder(tab,"L")
tab[L==column,column] = length(dt[to==column & to == from,from])
#tab[,`:=`(L=L,column=column/as.numeric(tot_edges_i))]
return(tab)
}
#this should be the first column of our table
first_column = data.table("L"=unique(toy_data[,c(to[!is.na(to)],from)]))
#loop through the values of the columns and merge to a unique df
for (col in sort(unique(toy_data[!is.na(to),c(to,from)]))){
info_column = copy(create_edge_cols(toy_data,col))
first_column = merge.data.table(first_column,info_column,all.x = TRUE,all.y = TRUE)
}
## function to set first row as name
header.true <- function(df) {
names(df) <- as.character(unlist(df[1,]))
df[-1,]
}
# this should be the last row of our matrix:
last_row = transpose(data.table(table(unlist(toy_data[!is.na(toy_data$to),c(from,to[to!=from])]))))
last_row = cbind(data.table(matrix(c("L","tot"), ncol=1)),last_row)
last_row = header.true(last_row)
last_row
# let's concatenate
final_matrix = rbind(first_column,last_row)
final_matrix
EDIT: solution suggested by previous answer now deleted:
library(igraph)
g <- graph_from_data_frame(na.omit(toy_data), directed = F)
am <- as_adjacency_matrix(g, type = "both")
addmargins(as.matrix(am[order(rownames(am)), order(colnames(am))]), 1)
Here is a way. What is missing in the question's table
statement are factor levels, table
is only processing what is in the data. Coerce the columns to factors with the same levels and assign NA
to counts equal to zero.
There is also a print
issue, see the final two instructions. The default for S# class "table"
method print is not to print NA
's. This can be changed manually.
library(data.table)
toy_data = data.table(from=c("A","A","A","C","E","E","A","A","A","C","E","E"),
to=c("B","C","A","D","F","E","E","A","A","A","C",NA))
levels <- sort(unique(unlist(toy_data)))
levels <- levels[!is.na(levels)]
toy_data[, c("from", "to") := lapply(.SD, factor, levels = levels)]
tbl <- table(toy_data)
is.na(tbl) <- tbl == 0
tbl
#> to
#> from A B C D E F
#> A 3 1 1 1
#> B
#> C 1 1
#> D
#> E 1 1 1
#> F
print(tbl, na.print = NA)
#> to
#> from A B C D E F
#> A 3 1 1 <NA> 1 <NA>
#> B <NA> <NA> <NA> <NA> <NA> <NA>
#> C 1 <NA> <NA> 1 <NA> <NA>
#> D <NA> <NA> <NA> <NA> <NA> <NA>
#> E <NA> <NA> 1 <NA> 1 1
#> F <NA> <NA> <NA> <NA> <NA> <NA>
Created on 2022-03-28 by the reprex package (v2.0.1)
To add a column sums row at the bottom of the cross table, rbind
the result above with colSums
. Note that there's no longer need for print(tbl, na.print = NA)
, the method print
(autoprint) being called is now the matrix method.
library(data.table)
toy_data = data.table(from=c("A","A","A","C","E","E","A","A","A","C","E","E"),
to=c("B","C","A","D","F","E","E","A","A","A","C",NA))
levels <- sort(unique(unlist(toy_data)))
levels <- levels[!is.na(levels)]
toy_data[, c("from", "to") := lapply(.SD, factor, levels = levels)]
tbl <- table(toy_data)
class(tbl) # check the output object class
#> [1] "table"
tbl <- rbind(tbl, tot = colSums(tbl, na.rm = TRUE))
is.na(tbl) <- tbl == 0
class(tbl) # check the output object class, it's no longer "table"
#> [1] "matrix" "array"
tbl
#> A B C D E F
#> A 3 1 1 NA 1 NA
#> B NA NA NA NA NA NA
#> C 1 NA NA 1 NA NA
#> D NA NA NA NA NA NA
#> E NA NA 1 NA 1 1
#> F NA NA NA NA NA NA
#> tot 4 1 2 1 2 1
Created on 2022-03-29 by the reprex package (v2.0.1)