Search code examples
python-2.7dictionaryiteratornested-listslistiterator

PYTHON 2.7 - Modifying List of Lists and Re-Assembling Without Mutating


I currently have a list of lists that looks like this:

My_List = [[This, Is, A, Sample, Text, Sentence] [This, too, is, a, sample, text] [finally, so, is, this, one]]

Now what I need to do is "tag" each of these words with one of 3, in this case arbitrary, tags such as "EE", "FF", or "GG" based on which list the word is in and then reassemble them into the same order they came in. My final code would need to look like:

GG_List = [This, Sentence]
FF_List = [Is, A, Text]
EE_List = [Sample]

My_List = [[(This, GG), (Is, FF), (A, FF), (Sample, "EE), (Text, FF), (Sentence, GG)] [*same with this sentence*] [*and this one*]]

I tried this by using for loops to turn each item into a dict but the dicts then got rearranged by their tags which sadly can't happen because of the nature of this thing... the experiment needs everything to stay in the same order because eventually I need to measure the proximity of tags relative to others but only in the same sentence (list).

I thought about doing this with NLTK (which I have little experience with) but it looks like that is much more sophisticated then what I need and the tags aren't easily customized by a novice like myself.

I think this could be done by iterating through each of these items, using an if statement as I have to determine what tag they should have, and then making a tuple out of the word and its associated tag so it doesn't shift around within its list.

I've devised this.. but I can't figure out how to rebuild my list-of-lists and keep them in order :(.

for i in My_List: #For each list in the list of lists
    for h in i:   #For each item in each list
         if h in GG_List:  # Check for the tag
            MyDicts = {"GG":h for h in i}  #Make Dict from tag + word

Thank you so much for your help!


Solution

  • Putting the tags in a dictionary would work:

    My_List = [['This', 'Is', 'A', 'Sample', 'Text', 'Sentence'],
               ['This', 'too', 'is', 'a', 'sample', 'text'],
               ['finally', 'so', 'is', 'this', 'one']]
    GG_List = ['This', 'Sentence']
    FF_List = ['Is', 'A', 'Text']
    EE_List = ['Sample']
    
    zipped = zip((GG_List, FF_List, EE_List), ('GG', 'FF', 'EE'))
    tags = {item: tag for tag_list, tag in zipped for item in tag_list}
    res = [[(word, tags[word]) for word in entry if word in tags] for entry in My_List]
    

    Now:

    >>> res
    [[('This', 'GG'),
      ('Is', 'FF'),
      ('A', 'FF'),
      ('Sample', 'EE'),
      ('Text', 'FF'),
      ('Sentence', 'GG')],
     [('This', 'GG')],
     []]