Search code examples
pythonalgorithmgraphsetnodes

Python Set not Detecting Duplicate Node


I have created a Graph class along with a Node class that I've tested. It is working with basic tests. I'm now trying to test with a custom input file and I'm running into an error where some of the Nodes are being duplicated. In the Graph class, there is a set called Nodes where the Node being created is added. However, in my parser file I have a checker which checks to see if the Node with the value has been added before but it's not working properly.
Even after adding in the hash function below, I still get set([Alpha, Beta, Hotel, Alpha]).
What am I doing wrong?

Test File:

directed unweighted
Alpha=Hotel
Alpha=Beta
Beta=Charlie
Charlie=Juliett
Charlie=Juliett
Delta=Foxtrot
Delta=Golf
Echo=Charlie
Echo=Delta
Foxtrot=Golf
Golf=Juliett
Golf=Alpha
Hotel=Echo
Hotel=India
India=Beta
India=Golf
Juliett=Golf

Graph and Node Class:



class Graph:
    def __init__(self):
        self.Nodes = set()

    def addVertex(self, vertex):
        self.Nodes.add(vertex)
    
    def getVertices(self):
        return self.Nodes


    @staticmethod
    def addEdgeDirect(fromEdge, toEdge, cost=1):
        fromEdge.neighbors[toEdge] = cost


class Node():
    def __init__(self, val = None):
        self.value = val
        self.neighbors = {}

    def getEdges(self):
        return self.neighbors.keys()
    
    def __repr__(self):
        return str(self.value)

Test File Parser:

from graph import Graph, Node

graph = Graph()

file1 = open('Graphs/du.gl', 'r')
Lines = file1.readlines()

direction = Lines[0].split(" ")[0].strip()
weight = Lines[0].split(" ")[1].strip() 

count = 0
if(direction == "directed"):
    if(weight == "unweighted"):
        for line in Lines:
            count += 1
            if(count == 1):
                continue
            node1 = Node(line.split("=")[0].strip())
            node2 = Node(line.split("=")[1].strip())
            if node1 not in graph.Nodes:
                print("not found, to be added")
                graph.addVertex(Node(node1))

            if node2 not in graph.Nodes:
                graph.addVertex(Node(node2))    

            print(graph.getVertices())
            graph.addEdgeDirect(node1,node2)
            # print("Line{}: {}".format(count, line.strip()))



Solution

  • In a set objects are always distinguished by their object ID (reference). So it doesn't help to define __hash__ either.

    I would suggest to:

    • Use a dictionary instead of a set
    • Only create Node instances when you already know that you need a new instance -- so after having checked that the Nodes dictionary doesn't have it yet
    • By consequence: pass the string value to addVertex instead of a Node instance.

    With some other little changes, your code could be:

    class Graph:
        def __init__(self):
            self.nodes = {}
    
        def addVertex(self, key):
            if key not in self.nodes:
                self.nodes[key] = Node(key)
            return self.nodes[key]
        
        def getVertices(self):
            return self.nodes.values()
    
        @staticmethod
        def addEdgeDirect(fromEdge, toEdge, cost=1):
            fromEdge.neighbors[toEdge] = cost
    
    
    class Node():
        def __init__(self, val=None):
            self.value = val
            self.neighbors = {}
    
        def getEdges(self):
            return self.neighbors.keys()
        
        def __repr__(self):
            return str(self.value)
        
    
    graph = Graph()
    file1 = open('du.gl', 'r')
    firstline, *lines = file1.readlines()
    direction, weight = firstline.split()
    if direction == "directed":
        if weight == "unweighted":
            for line in lines:
                graph.addEdgeDirect(*(graph.addVertex(key.strip()) 
                                      for key in line.split("=")))
                print(graph.getVertices())
    

    Addendum

    In comments you ask about getEdges and how it could return more information.

    Here is a variant of that method:

    def getEdges(self):
        return { self: self.neighbors }
    

    If you then do this:

    print(graph.nodes['Alpha'].getEdges())
    

    ...it will output:

    {Alpha: {Hotel: 1, Beta: 1}}