I have a text file as follows:
1 3
2 5
3 6
4 5
5 4
6 1
7 2
The above file represents the edges in an undirected graph. I want to remove the duplicate edges in the graph. In the above given example I want to remove either 4,5 or 5,4
as they represent the same edge in graph and hence causes duplication. I am trying to visualize the graph from the file using Graphstream
using the GraphX
library in Apache Spark. But due to the presence of duplicate nodes as explained above it gives an error as follows
org.graphstream.graph.EdgeRejectedException: Edge 4[5--4] was rejected by node 5
What would be the best way to remove such duplicates from the text file?
You can use convertToCanonicalEdges
method from GraphOps
. It
In your case:
val graph = Graph.fromEdgeTuples(sc.parallelize(
Seq((1, 3), (2, 5), (3, 6), (4, 5), (5, 4), (6, 1), (7, 2))), -1)
graph.convertToCanonicalEdges().edges.collect.foreach(println)
with result:
Edge(3,6,1)
Edge(1,6,1)
Edge(1,3,1)
Edge(2,5,1)
Edge(2,7,1)
Edge(4,5,1)