Search code examples
rdplyrtransformidentitymatrix-multiplication

Creating matrix from data frame according to external variable


I am trying to create a simple identity matrix of countries and whether they border one another. The idea is to have a large matrix, where the variables are countries name, and they are either given a 1 or a 0 if they share a border.

For example, given this dataset:

mydata <- fread( "country    border
Afghanistan     China        
Afghanistan     Iran        
Afghanistan     Pakistan       
Afghanistan     Tajikistan      
Afghanistan     Turkmenistan      
Afghanistan     Uzbekistan        
Aland_Islands   NA                 
Albania         Greece
Albania         Montenegro
Albania         North_Macedonia
Albania         Serbia
Algeria         Libya
Algeria         Mali
Algeria         Mauritania
Algeria         Morocco
Algeria         Niger
Algeria         Tunisia")

I would like to create the following:

mydata <- fread( "Country Afghanistan China Iran Pakistan Tajikistan Turkmenistan Uzbekistan Greece Albania Montenegro
Afghanistan 0 1 1 1 1 1 1 0 0 0
China 1 0 0 0 0 0 0 0 0 0
Iran 1 0 0 1 0 1 0 0 0 0  
Pakistan 1 1 1 0 0 0 0 0 0 0       
")
Country     Afghanistan China Iran Pakistan Tajikistan Turkmenistan Uzbekistan Greece Albania Montenegro
Afghanistan           0     1    1        1          1            1          1      0       0          0
      China           1     0    0        0          0            0          0      0       0          0
       Iran           1     0    0        1          0            1          0      0       0          0
   Pakistan           1     1    1        0          0            0          0      0       0          0

Solution

  • Since your data is already have a data.table, try casting to width, using 'length' as the aggregate function, and fill missing combinations with a 0.

    dcast( mydata, country ~ border, fun.aggregate = length, fill = 0 )