Search code examples
pythoniteratorpython-itertools

Unable to use value from two iterators in same loop


I have the following python script where i want to take values from two iterators alternatively.

filename = "small"
with open(filename,'r') as plot_data:
    main_dict = dict()
    line_one = itertools.islice(plot_data, 0, None, 4)
    line_two = itertools.islice(plot_data, 2, None, 4)
    dictionary = defaultdict(list)
    #take values from iterators alternatively.
    for movie_name, movie_plot in itertools.izip(line_one, line_two):
        movie_plot = movie_plot.lower()
        words = re.findall(r'\w+', movie_plot, flags = re.UNICODE | re.LOCALE)
        elemStopW = filter(lambda x: x not in stopwords.words('english'), words)
        #list of words.
        print elemStopW
        for word in elemStopW:
            word = PorterStemmer().stem_word(word)
            dictionary[movie_name].append(word)
            main_dict[word] = len(main_dict)
    print main_dict 

This script is not printing anything. I do not understand why. I do not want to merge the iterators as I want to use both the values in the same loop.

Any help appreciated.

EDIT: To avoid some clearance(as in comments). The following script works fine

filename = "small"
with open(filename,'r') as plot_data:                    
        main_dict = dict()
        line_one = itertools.islice(plot_data, 0, None, 4)
        dictionary = defaultdict(list)                                                                      
        for movie_name in line_one:                                                           
                print movie_name  

Solution

  • This will probably not do what you expect:

    line_one = itertools.islice(plot_data, 0, None, 4)
    line_two = itertools.islice(plot_data, 2, None, 4)
    

    Since plot_data is a file object, reading from either iterator will advance the file - it will still be read sequentially rather than being read twice in parallel.

    You can use itertools.tee to duplicate the file iterator so that it can be read twice in parallel:

    plot1, plot2 = itertools.tee(plot_data, 2)
    line_one = itertools.islice(plot1, 0, None, 4)
    line_two = itertools.islice(plot2, 2, None, 4)
    

    Note that if the positions of the two iterators can drift far from each other, this can take a lot of memory and you're better off opening the file twice. In this case, this shouldn't be a problem.