I am new to Spark Graphx and have dataframe for edges as:
Dataframe : edges_main
+------------------+------------------+------------+--------+-----------+
| src| dst|relationship|category|subcategory|
+------------------+------------------+------------+--------+-----------+
|294201130817328347|294201131015844283| friend | school| class|
|294201131015844283|294201131007361339| brother | home | cousin|
|294201131015844283|294201131014451003| son | home | relative|
-------------------------------------------------------------------------
and vertices as:
Dataframe : vertices_main
+------------------+----------+
| id |value|name|
+------------------+----------+
|294201130817328347|Mary |a |
|294201131015844283|Hola |b |
|294201131015844283|Rama |c |
-------------------------------
I want to preserve additional attributes in Graphx in so that I can access them with map
. My code:
case class MyEdges(src: String, dst: String, attributes: MyEdgesLabel)
case class MyEdgesLabel(relationship:String,category: String ,subcategory:String)
val edges = edges_main.as[MyEdges].rdd.map { edge =>
Edge(
edge.src.toLong,
edge.dst.toLong,
//**what to mention here(MyEdgesLabel)**//
)}
case class MyVerticesLabel(name:String)
val vertices: RDD[(VertexId, Any)] = vertices_data.rdd.map(verticesRow => (
verticesRow.getLong(0),
verticesRow.getString(1))
//**what to mention here(MyVerticesLabel)**//
)
The reason of above requirement is the after creating graph, I can access additional attributes directly in following way:
val g = Graph(vertices, edges)
g.vertices.map(v => v._1 + v._2 + /*addidtional attributes which is in case class MyEdgesLabel*/).collect.mkString
g.edges.map(e => e.srcId + e.dstId + e.attr(/*addidtional attributes which is in case class
MyVerticesLabel*/))).collect.mkString
I got some clue from below url yet I'm still confused in catering multiple attributes in both vertices and edges: http://www.sunlab.org/teaching/cse6250/fall2019/spark/spark-graphx.html#graph-construction.
Kindly help regarding the same.
You can use a case class as edge attribute and another as the vertex property. MyEdgesLabel
is already ok for the edges, to crete the edge RDD
, simply do:
val edges = edges_main.as[MyEdges].rdd.map { edge =>
Edge(
edge.src.toLong,
edge.dst.toLong,
MyEdgesLabel(edge.relationship, edge.category, edge.subcategory)
)}
For the vertices, you need to include both value
and name
in the case class:
case class MyVerticesLabel(value: String, name: String)
Then use it to create the vertex RDD
:
val vertices: RDD[(VertexId, MyVerticesLabel)] = vertices_data.rdd.map{verticesRow =>
(verticesRow.getAs[Long]("id"),
MyVerticesLabel(verticesRow.getAs[String]("value"), verticesRow.getAs[String]("name")))
}
Now, the values can easily be accessed, e.g.:
g.edges.map(e => e.srcId + e.dstId + e.attr.relationship).collect.mkString