I have the following python script where i want to take values from two iterators alternatively.
filename = "small"
with open(filename,'r') as plot_data:
main_dict = dict()
line_one = itertools.islice(plot_data, 0, None, 4)
line_two = itertools.islice(plot_data, 2, None, 4)
dictionary = defaultdict(list)
#take values from iterators alternatively.
for movie_name, movie_plot in itertools.izip(line_one, line_two):
movie_plot = movie_plot.lower()
words = re.findall(r'\w+', movie_plot, flags = re.UNICODE | re.LOCALE)
elemStopW = filter(lambda x: x not in stopwords.words('english'), words)
#list of words.
print elemStopW
for word in elemStopW:
word = PorterStemmer().stem_word(word)
dictionary[movie_name].append(word)
main_dict[word] = len(main_dict)
print main_dict
This script is not printing anything. I do not understand why. I do not want to merge the iterators as I want to use both the values in the same loop.
Any help appreciated.
EDIT: To avoid some clearance(as in comments). The following script works fine
filename = "small"
with open(filename,'r') as plot_data:
main_dict = dict()
line_one = itertools.islice(plot_data, 0, None, 4)
dictionary = defaultdict(list)
for movie_name in line_one:
print movie_name
This will probably not do what you expect:
line_one = itertools.islice(plot_data, 0, None, 4)
line_two = itertools.islice(plot_data, 2, None, 4)
Since plot_data
is a file object, reading from either iterator will advance the file - it will still be read sequentially rather than being read twice in parallel.
You can use itertools.tee
to duplicate the file iterator so that it can be read twice in parallel:
plot1, plot2 = itertools.tee(plot_data, 2)
line_one = itertools.islice(plot1, 0, None, 4)
line_two = itertools.islice(plot2, 2, None, 4)
Note that if the positions of the two iterators can drift far from each other, this can take a lot of memory and you're better off opening the file twice. In this case, this shouldn't be a problem.