Search code examples
pythonexcelmany-to-manyrelationshipxlrd

Python - Iterative self cross referencing


I have a bit of a logical challenge. I have a single table in excel that contains an identifier column and a cross reference column. There can be multiple rows for a single identifier which indicates multiple cross references. (see basic example below)

enter image description here

Any record that ends in the letter "X" indicates that it is a cross reference, and not an actual identifier. I need to generate a list of the cross references for each identifier, but trace it down to the actual cross reference identifier. So using "A1" as an example from the table above, I would need the list returned as follows "A2,A3,B1,B3". Notice there are no identifiers ending in "X" in the list, they have been traced down to the actual source record through the table.

Any ideas or help would be much appreciated. I'm using python and xlrd to read the table.


Solution

  • t = [
        ["a1","a2"],
        ["a1","a3"],
        ["a1","ax"],
        ["ax","b1"],
        ["ax","bx"],
        ["bx","b3"]
    ]
    
    import itertools
    def find_matches(t,key):
        return list(itertools.chain(*[[v] if not v.endswith("x") else find_matches(t,v) for k,v in t if k == key]))
    
    print find_matches(t,"a1")
    

    you could treat your list as an adjacency matrix of a graph

    something like

    t = [
        ["a1","a2"],
        ["a1","a3"],
        ["a1","ax"],
        ["ax","b1"],
        ["ax","bx"],
        ["bx","b3"]
    ]
    class MyGraph:
        def __init__(self,adjacency_table):
            self.table = adjacency_table
            self.graph = {}
            for from_node,to_node in adjacency_table:
                if from_node in self.graph:
                    self.graph[from_node].append(to_node)                
                else:
                    self.graph[from_node] = [to_node]
            print self.graph
        def find_leaves(self,v):
            seen = set(v)
            def search(v):
                for vertex in self.graph[v]:
                    if vertex in seen:
                        continue
                    seen.add(vertex)
                    if vertex in self.graph:
                        for p in search(vertex):
                           yield p
                    else:
                        yield vertex
            for p in search(v):
                yield p
    
    
    
    print list(MyGraph(t).find_leaves("a1"))#,"a1")