This is an assignment from Python For Everyone Chapter 10 assignment 10.2 in which the problem states
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon. "From person@example.com Sat Jan 5 09:14:16 2008" Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
The desired output is
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
my code is here
`name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
for line in handle:
line = line.rstrip()
if line.startswith("From "):
parts = line.split()
# print parts
time = parts[5]
pieces = time.split(':')
hour = pieces[0]
counts[hour] = counts.get(hour,0)+1
print counts `
The text file can be found here http://www.pythonlearn.com/code/mbox-short.txt
When debugging, I realized that my compiler goes through each line many times to return values for each hour that is way too high. I'm sure that the syntax line.startswith("From ")
is correct for only reading the intended lines because I used in a previous assignment.
How can I get the correct frequency for hours?
you code works fine as I tried it.
for the output a dictionary is unsorted. you can use sort(counts) which returns a sorted list of the keys. with these you can print your dict in a sorted way
name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
counts = dict()
for line in handle:
line = line.rstrip()
if line.startswith("From "):
parts = line.split()
time = parts[5]
pieces = time.split(':')
hour = pieces[0]
counts[hour] = counts.get(hour,0)+1
for key in sorted(counts):
print key + " " + str(counts[key])
output is
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1