Search code examples
rdata-miningcomplex-networkssna

Starting with Complex Networks / SNA. Turning datasets into expected format


I'm just starting to get to grips with ideas and techniques behind complex networks and social network analysis. I seem to always trip up and get stuck at the data preparation stage. I often have a dataset in Google Refine that is basically a bunch of rows that are somehow related. For example at present I have a list of organisations and events which they have attended (with some duplicates as an organisation may have sent more than one delegate to an event)

My Google Refine Data

So I can see that organisations would be nodes on my graph and that the relationship between them exists if they both attended the same event, however I don't know how I turn this dataset into a format that a tool such as NWB, Gephi, R or Tulip would understand.

I often find myself in a situation where I have a dataset I can see the relationship between columns but I did not know the next steps I should take to prepare my data ready to be imported by such tools to explore the relationship. I've poked around documentation for supported file types and my guess is that doing something with the RDF skeleton tool in Refine and taking a linked data style approach may be a possible solution, but I am having no luck.

Any tips for data preparation would be appreciated.


Solution

  • Just incase anybody stumbles across this in the future.. I did this by creating by importing my CSV in R and creating a one mode matrix, then a graph based on that.

    organisations_events<-read.csv("/Users/David/Desktop/PhD/R_github/ROI/data/Ins_Event.csv" , header=T, sep=",")
    
    df<-read.csv("/Users/David/Desktop/PhD/R_github/ROI/data/Ins_Event.csv" , header=T,     sep=",")
    
    M = as.matrix( table(df) )
    
    Mrow = M %*% t(M)
    
    #Mcol = t(M) %*% M
    
    write.csv(Mrow, "test.csv")
    

    Blogged the answer here if it helps.. http://www.davidsherlock.info/network-analysis