Search code examples
excelvbagraphgephi

how to find edge from data in Excel


I'm trying to find the relation (edges) between nodes using Excel and VBA. I will use the output in Gephi, but the data that I have in Excel is too large, and this an example for my question to find the true relations.

If I have this data:

  'data for id_books that user_id borrowed
    user_id        id_book     book
    1                55        physic           
    2                55        physic
    2                55        physic
    3                55        physic
    4                55        physic

this is the output is show me the users that borrowed the same book from library:

    nodes(user_id):       edges(relation between user_id)
                           source,target
      1                    1,2
      2                    1,3
      3                    1,4 
      4                    2,3
                           2,4
                           2,3
                           2,4

is that correct to show me 1,2 just once?


Solution

  • There are two closely related structures in Discrete Mathematics, graphs and multigraphs. A graph is a set of nodes and a set of pairs of nodes. If you want to define a graph whose nodes are users and whose edges correspond to the relation of having borrowed the same book at least once, then it wouldn't make sense to list an edge like (1,2) more than once. On the other hand, in a multigraph edges can be repeated. Storing (1,2) multiple times would tell you that user 1 and user 2 have borrowed the same book, with at least one of those users having borrowed the book at least twice. If you would find that information useful, use a multigraph. Otherwise use a graph. I would think that something like Gephi would be able to draw both graphs and multigraphs, so in that sense it really is up to you. Note, however, that drawings of multigraphs can be harder to read since they have more visual clutter. I hate cluttered diagrams, so I would probably prefer to use single-edge graphs rather than multi-edge multigraphs, but that is more of a preference on my part. You might have a strong reason to prefer multigraphs in your intended application.