Im trying to concat/join the values of 2 keys in apache beam to get a new list composed of all items in the two keys.
Suppose I have a PCollection as follows:
(
"Key1": [file1, file2],
"Key2": [file3, file4],
)
How do I achieve a PColletion which looks like this using the python apache-beam sdk:
(
"Key3": [file1, file2, file3, file4]
)
I solved this using the following code
new_pcol = (
(
pcol1,
pcol2,
)
| "Flatten" >> beam.Flatten()
| "Format flat" >> beam.MapTuple(lambda k, files: ("Key3", files))
| "Group by new key" >> beam.GroupByKey()
)