I have a bit of a logical challenge. I have a single table in excel that contains an identifier column and a cross reference column. There can be multiple rows for a single identifier which indicates multiple cross references. (see basic example below)
Any record that ends in the letter "X" indicates that it is a cross reference, and not an actual identifier. I need to generate a list of the cross references for each identifier, but trace it down to the actual cross reference identifier. So using "A1" as an example from the table above, I would need the list returned as follows "A2,A3,B1,B3". Notice there are no identifiers ending in "X" in the list, they have been traced down to the actual source record through the table.
Any ideas or help would be much appreciated. I'm using python and xlrd to read the table.
t = [
["a1","a2"],
["a1","a3"],
["a1","ax"],
["ax","b1"],
["ax","bx"],
["bx","b3"]
]
import itertools
def find_matches(t,key):
return list(itertools.chain(*[[v] if not v.endswith("x") else find_matches(t,v) for k,v in t if k == key]))
print find_matches(t,"a1")
you could treat your list as an adjacency matrix of a graph
something like
t = [
["a1","a2"],
["a1","a3"],
["a1","ax"],
["ax","b1"],
["ax","bx"],
["bx","b3"]
]
class MyGraph:
def __init__(self,adjacency_table):
self.table = adjacency_table
self.graph = {}
for from_node,to_node in adjacency_table:
if from_node in self.graph:
self.graph[from_node].append(to_node)
else:
self.graph[from_node] = [to_node]
print self.graph
def find_leaves(self,v):
seen = set(v)
def search(v):
for vertex in self.graph[v]:
if vertex in seen:
continue
seen.add(vertex)
if vertex in self.graph:
for p in search(vertex):
yield p
else:
yield vertex
for p in search(v):
yield p
print list(MyGraph(t).find_leaves("a1"))#,"a1")