I am banging my head to convert the following spark RDD data using code
[('4', ('1', '2')),
('10', ('5',)),
('3', ('2',)),
('6', ('2', '5')),
('7', ('2', '5')),
('1', None),
('8', ('2', '5')),
('9', ('2', '5')),
('2', ('3',)),
('5', ('4', '2', '6')),
('11', ('5',))]
def adjDang(line, tc):
node, edges = line
print(f'node {node} edges {edges}')
if edges == None:
return (int(node),(0,0))
else:
if len(edges) == 1:
newedges = (edges[0]) #remove the comma which is unnecessary check '11'
else:
newedges = ()
for i in range(len(edges)):
newedges += edges[i]
print(f'node {node} edge{newedges}')
return(int(node), (1/tc, newedges))
I am getting the following output
[(4, (0.09090909090909091, ('1', '2'))),
(10, (0.09090909090909091, '5')),
(3, (0.09090909090909091, '2')),
(6, (0.09090909090909091, ('2', '5'))),
(7, (0.09090909090909091, ('2', '5'))),
(1, (0, 0)),
(8, (0.09090909090909091, ('2', '5'))),
(9, (0.09090909090909091, ('2', '5'))),
(2, (0.09090909090909091, '3')),
(5, (0.09090909090909091, ('4', '2', '6'))),
(11, (0.09090909090909091, '5'))]
The expectation is to get the output in the format (node_id , (score, edges)) so for example for node 5, it should look like (5, (0.09090909090909091, 4, 2, 6)). those extra brackets should go away so that it looks like 1 single tuple after the node and the edges should be integers.
Appreciate any pointers on how to achieve this please
If you're using Python 3.5 or above, just change the return statement to
return(int(node), (1 / tc, *newedges))
(Same as what you have but with a *
)