Convert set of Tuples ('Id', 'row 1'), ('Id', 'row 2') to List['Id',['row 1', 'row2'] in Python

Good afternoon,

As the title states I'm trying to convert a set of tuples that have duplicate values in the first position, but differing in the second.

I'm sure there's a pretty simple way to build this List object but I'm fairly new to the language and I'm struggling to do so.

I've tried making a dictionary, but found that dictionaries require unique keys, otherwise the original value is overridden.

The purpose of this conversion is to post these records to the smartsheet api by id. I would like to follow their suggestion of bulk processing by only pinging the sheet once per n records rather than pinging the sheet n times.

Any advice would be greatly appreciated!

Thanks, Channing

Solution

I'd do this in two separate steps. First, make a dictionary where the key is the first element of your tuple and the value is a list of all second-elements that share the same first element.

Second, interleave the keys and values into a proper list.

import itertools

# your initial set of tuples
tuples = {('Id', 'row1'), ('Id', 'row2'), ('Id2', 'row3')}

# create a dict, as above - 
#    key is the first element of tuple
#    value is a list of the second elements of those tuples
dct = {}
for t in tuples:
    dct.setdefault(t[0], []).append(t[1])
print(dct)
# {'Id2': ['row3'], 'Id': ['row1', 'row2']}

# coalesce the dict's keys and values into a list
# we use itertools.chain to make this more straightforward,
# but it's essentially concatenating the tuple elements of dct.items() to
# each other, by using the unpacking operator `*` to provide them individually
# as arguments.
outp = list(itertools.chain(*dct.items()))
print(outp)
# ['Id2', ['row3'], 'Id', ['row1', 'row2']]

This has linear time complexity, as it runs through each element of the input (tuples) exactly twice.