I am trying to combine historical data, which comes from an ancient custom made email system, to create a database with Python. One list (b
) contains the email id, and another list (a
) contains filenames of attachments. An email may have zero, one, or many attachments. There are thousands of records to process.
I have extracted the data in the following format:
a = [[], ['a'], ['b', 'c', 'd']]
b = ['c1', 'c2', 'c3']
I want the empty data in 'a' removed and the remaining data in the following format, but don't care if it is a list or tuple.
x = [[['c2', 'a'], [['c3', 'b'], ['c3', 'c'], ['c4', 'd']]]
I have tried using zip
x = zip(b, a)
But that added to the start of each
(('c1', []), ('c2', ['a']), ('c3', ['b', 'c', 'd']))
I tried itertools chain:
op = [list(itertools.chain(*i))
for i in zip(b, a)]
But that yielded:
[['c', '1'], ['c', '2', 'a'], ['c', '3', 'b', 'c', 'd']]
I have also tried using re.findall()
to get the data into a more usable format, but there will usually be a mismatched number of email ids to filenames. There is lots of stuff about lists and joining, etc., but I haven't found anything useful regarding a list within a list where there is variable length.
I hope I've understood your question right (in your output you have c4
but I think it should be c3
):
a = [[], ["a"], ["b", "c", "d"]]
b = ["c1", "c2", "c3"]
out = [[[v, s] for s in l] for v, l in [t for t in zip(b, a) if t[1]]]
print(out)
Prints:
[[["c2", "a"]], [["c3", "b"], ["c3", "c"], ["c3", "d"]]]