Search code examples
pythonbashstdinsys

for loop through stdin using previous item


I would like to compare a line to the previous one without storing anything in memory (no dictionaries).

Sample data:

a   2
file    1
file    2
file    4
for 1
has 1
is  2
lines   1
small   1
small   2
test    1
test    2
this    1
this    2
two 1

Pseudocode:

for line in sys.stdin:
    word, count = line.split()
    if word == previous_word:
        print(word, count1+count2)

I know I would use enumerate or dict.iteritems over an array but I can't on sys.stdin.

Desired output:

a   2
file    7
for 1
has 1
is  2
lines   1
small   3
test    3
this    3
two 1

Solution

  • The basic logic is to keep track of the previous word. If the current word matches, accumulate the count. If not, print the previous word and its count, and start over. There's a little special code to handle the first and last iterations.

    stdin_data = [
        "a   2",
        "file    1",
        "file    2",
        "file    4",
        "for 1",
        "has 1",
        "is  2",
        "lines   1",
        "small   1",
        "small   2",
        "test    1",
        "test    2",
        "this    1",
        "this    2",
        "two 1",
    ]  
    
    previous_word = ""
    word_ct = 0
    
    for line in stdin_data:
        word, count = line.split()
        if word == previous_word:
            word_ct += int(count)
        else:
            if previous_word != "":
                print(previous_word, word_ct)
            previous_word = word
            word_ct = int(count)
    
    # Print the final word and count
    print(previous_word, word_ct)
    

    Output:

    a 2
    file 7
    for 1
    has 1
    is 2
    lines 1
    small 3
    test 3
    this 3
    two 1