Search code examples
pythontuplesnested-listsdestructuring

How to separate list of lists having tuples (as if from a word2vec most_similar results) into individual variables?


I am writing a program to find the most similar words for a list of input words using Gensim for word2vec.

Example: input_list = ["happy", "beautiful"]

Further, I use a for loop to iterate over the list and store the output in a list data structure using the .append() function.

The final list is a list of lists having tuples. See below

results = [[('glad', 0.7408891320228577),
('pleased', 0.6632170081138611)],
[('gorgeous', 0.8353002667427063),
('lovely', 0.8106935024261475)]]

My question is how to separate the list of lists into independent lists? I followed answers from 1 and 2 which suggest unpacking like a, b = results.

But this is possible when you know the number of input elements (here 2).

Expected Output (based on above):

list_a = [('glad', 0.7408891320228577), ('pleased', 0.6632170081138611)]
list_b = [('gorgeous', 0.8353002667427063), ('lovely', 0.8106935024261475)]

But, if the number of input elements is always variable, say 4 or 5, then how do we unpack and get a reference to the independent lists at run-time?

Or what is a better data structure to store the above results so that unpacking or further processing is friendlier?

Kindly help.


Solution

  • If you have a variable number of query-words - sometimes 2, sometimes 5, sometimes any other number N – then you almost certainly do not want to bring those out into totally-separate variable names (like list_a, list_b, etc).

    Why not? Well, your next steps will then likely be to do something to each of the N items.

    And to do that, you'll then want them in some sort of indexed-list you can iterate over.

    What if instead, they're in some bunch of local variables - list_a, list_b, list_c, list_d - like you've requested? Then in the case where there's only 3, some of those variables, like list_d, either won't exist (be undefined) or will hold some different signal value (like say None).

    For most tasks, that will then be harder to work with - requiring awkward branches/tests for evey possible count of results.

    Instead, your existing results, which is a list, where you can access each by numeric index – results[0], results[1] – either alone, or in a loop, is a much more typically-useful structure when the count of things you're dealing with will vary.

    If you think you have a valid reason for your expected end-state, please describe the reason, and especially the next things you then want to do, in more detail, via an expansion to the question. And consider those next steps for several different scenarios: just 1 set of results, 2 ests of results, 5 sets of results, 100 sets of results. (In that last case, what would you even name the variables, beyond list_z?)

    (Separately, this is not really a question about Gensim or word2vec, but about core Python language features and variable/data-structure handling. So I've removed those tags, and added destructuring, a term for the sort of multiple-variable assignment that almost does what you need but isn't quite right, and will tune the title a bit.)