I have extracted the links between the wikipedia pages in an RDD which has the following format:
Array[(String, String)] = Array((AccessibleComputing,[Computer accessibility]),
(Anarchism,[political philosophy, stateless society]))
Where the first string is a page (Vertex) and the second is a list of links (Edges) pointing towards other Wiki pages.
How can I convert it into, graph friendly format like that:
Array(
(AccessibleComputing,Computer accessibility),
(Anarchism,stateless society),
(Anarchism,political philosophy)
)
so that the edge is repeated for each vertex
drop
, split
and flatMap
?
data.flatMap{case (k, v) => v.drop(1).dropRight(1).split(", ").map((k, _))}