How do i iterate through a defaultdict(list) in Python?
Is there a better way of having a dictionary of lists in Python?
I've tried the normal iter(dict)
but I've got the error:
>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "para.py", line 31, in print_doc
for para in iter(doc):
TypeError: iteration over non-sequence
The main class:
import para
para.print_doc('./foo/bar/para-lines.txt')
The para.pyc:
# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Separator here refers to the paragraph seperator,
# the default separator is '\n'.
def __init__(self, filename, separator=None):
# Set separator if passed into object's parameter,
# else set default separator as '\n'
if separator is None:
def separator(line): return line == '\n'
elif not callable(separator):
raise TypeError, "separator argument must be callable"
self.separator = separator
# Reading lines from files into a dictionary of lists
self.doc = defaultdict(list)
paraIndex = 0
with open(filename) as readFile:
for line in readFile:
if line == separator:
paraIndex+=1
else:
self.doc[paraIndex].append(line)
# Prints out populated doc from txtfile
def print_doc(filename):
text = Paragraphs(filename)
for para in iter(text.doc):
for sent in text.doc[para]:
print "Para#%d, Sent#%d: %s" % (
para, text.doc[para].index(sent), sent)
An e.g. of ./foo/bar/para-lines.txt
looks like this:
This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.
This is the start of next para.
foo boo bar bar
this is the end.
The output of the main class should look like this:
Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.
Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
The recipe you linked to is rather old. It was written in 2001 before Python had more modern tools like itertools.groupby (introduced in Python2.4, released in late 2003). Here is what your code could look like using groupby
:
import itertools
import sys
with open('para-lines.txt', 'r') as f:
paranum = 0
for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
if is_separator:
# we've reached paragraph separator
print
else:
paranum += 1
for n, sentence in enumerate(paragraph, start = 1):
sys.stdout.write(
'Para#{i:d},Sent#{n:d}: {s}'.format(
i = paranum, n = n, s = sentence))