I would like to compare a line to the previous one without storing anything in memory (no dictionaries).
Sample data:
a 2
file 1
file 2
file 4
for 1
has 1
is 2
lines 1
small 1
small 2
test 1
test 2
this 1
this 2
two 1
Pseudocode:
for line in sys.stdin:
word, count = line.split()
if word == previous_word:
print(word, count1+count2)
I know I would use enumerate
or dict.iteritems
over an array but I can't on sys.stdin
.
Desired output:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
The basic logic is to keep track of the previous word. If the current word matches, accumulate the count. If not, print the previous word and its count, and start over. There's a little special code to handle the first and last iterations.
stdin_data = [
"a 2",
"file 1",
"file 2",
"file 4",
"for 1",
"has 1",
"is 2",
"lines 1",
"small 1",
"small 2",
"test 1",
"test 2",
"this 1",
"this 2",
"two 1",
]
previous_word = ""
word_ct = 0
for line in stdin_data:
word, count = line.split()
if word == previous_word:
word_ct += int(count)
else:
if previous_word != "":
print(previous_word, word_ct)
previous_word = word
word_ct = int(count)
# Print the final word and count
print(previous_word, word_ct)
Output:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1