Search code examples
pythonfrequencyword-count

Count nr of occurrences of strings in one column based on value in other column Python


Sorry in advance for the really basic question and I know there are posts about this issue everywhere, but I cannot seem to get around it also with all the help on those other web pages.

For starters, I am a beginner with python so sorry for the blurry code. But what I simply want is to count the number of times a certain string occurs in column 2, when the value in column 1 stays the same. If this value changes, the loop should start over. It sounds really simple, but I am confused by python reading my text file as a string (giving me the issues with strip and split and so on). I cannot seem to get this code working. Please someone help out this noob in distress!

Input:

    6    ABMV
    6    ABMV
    6    FOOD
    6    FOOD
    6    IDLE
    10    IDLE
    10    ABMV
    10    IDLE

Code:

    #! /usr/bin/env python

    from collections import Counter

    outfile = open ("counts_outfile.txt", "w")

    with open("test_counts.txt", "r") as infile:
        lines = infile.readlines()
        for i, item in enumerate(lines):
        lines[i] = item.rstrip().split('\t')
        last_chimp = lines[0][0]
        behavior = lines[0][1]
        nr_ABMV = 0
        nr_FOOD = 0
        nr_IDLE = 0

        for lines in infile:
            chimp = lines[0][0]
            behavior = lines[0][1]
            if chimp == last_chimp:
                if behavior == "ABMV":
                    nr_ABMV += 1
                elif behavior == "FOOD":
                    nr_FOOD += 1
                elif behavior == "IDLE":
                    nr_IDLE += 1
                else:
                    continue
        else:
            outline = "chimp_header %s\t%s\t%s\t%s" % (last_chimp, nr_ABMV, nr_FOOD, nr_IDLE)
            outfile.write(outline)
            last_chimp == lines[0][0]
            nr_ABMV = 0
            nr_FOOD = 0
            nr_IDLE = 0

    outfile.close()

Thank you in advance, you will help me and obviously a lot of 'chimps' (chimpanzees) a lot!!

regards,


Solution

  • Here is an example, very similar to your code :

    outfile = open ("counts_outfile.txt", "w")
    outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
    
    with open("test_counts.txt", "r") as infile:
        lines = [ line.strip() for line in infile if line.strip() ]
    
    last_chimp = lines[0].split()[0]
    behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
    
    for line in lines : 
        line_split = line.strip().split() 
        chimp = line_split[0]
    
        if chimp != last_chimp : 
            outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
            last_chimp = chimp
            behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
        behavior[line_split[1]] += 1
    
    outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
    outfile.close()
    

    Here is another example using Counter and a dictionary :

    from collections import Counter
    
    with open("test_counts.txt", "r") as infile:
        lines = [ tuple(line.strip().split()) for line in infile if line.strip() ]
    
    chimps = { line[0] : { "ABMV":0, "FOOD":0, "IDLE":0 } for line in lines }
    for k, v in Counter(lines).items() :
        chimps[k[0]][k[1]] = v
    
    with open("counts_outfile.txt", "w") as outfile : 
        outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
        for chimp in chimps : 
            outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(chimp, chimps[chimp]["ABMV"], chimps[chimp]["FOOD"], chimps[chimp]["IDLE"]))
    

    Both examples produce the same results :

    chimp_header chimp ABMV FOOD IDLE
    chimp_header    6    2    2    1
    chimp_header   10    1    0    2
    

    I hope this gives you some ideas.