Search code examples
rdataframedata-cleaningadjacency-matrix

What is the most efficient process to reconstruct a dyadic dataframe from an adjacency matrix?


I apologize for what I imagine is a fairly simple question. Unfortunately while my searches on here have returned a number of results for making adjacency matrices from dyadic dataframes, I haven't come across anything for the opposite process - creating a dyadic dataframe from an adjacency matrix.

Here is a subset of the xls file I am working from in R:

ccode           2       20      31     40      41      42   
    year        2010    2010    2010   2010    2010    2010 
        abbrev  USA     CAN     BHM    CUB     HAI     DOM  
2   2010    USA 0       1       1      1       1       1    
20  2010    CAN 1       0       0      1       1       1    
31  2010    BHM 1       1       0      1       1       0    
40  2010    CUB 1       1       1      0       1       1    
41  2010    HAI 1       1       1      1       0       1    
42  2010    DOM 1       1       0      1       1       0    
51  2010    JAM 1       1       0      1       0       0    

I want it to look like this:

ccode   ccode2  year    Value       
2       20      2010    1   
2       31      2010    1
2       40      2010    1        
...    
20      31      2010    0     
20      40      2010    1 
20      41      2010    1       
...       

What R package(s)/code is necessary to perform such a transformation?

For those wishing to access the full data, it is the DIPCON 3.0 database and it can be found here: https://www.volgy.org/projects-and-data


Solution

  • path = "DIPCON_3.0.xlsx"# Put the correct path to your file
    library(readxl) 
    sheets = excel_sheets(path)
    
    my_read = function(x){
      dat = read_excel(path,x)
      c_names = 1:4# The column names-also same as the row names
      col_names = do.call(paste,data.frame(t(dat[c_names,-c_names])))
      row_names = do.call(paste,dat[-c_names,c_names])
      dat1 = as.table(matrix(as.numeric(unlist(dat[-c_names,-c_names])),
                    nrow(dat)-4,dim=list(row_names,col_names)))
      d = data.frame(dat1)
      l = nrow(d)
      proto = data.frame(ccode=numeric(l),Year=numeric(l),C1=character(l),C2=character(l))
      m = do.call(cbind,lapply(d[2:1],function(x) strcapture("(\\d+) (\\d+) (\\w+) (\\w+)",x,proto)))
      cbind(m,d[3])
    }
    
    my_read(sheets[1])