Search code examples
pythonpython-3.xnlplinguistics

Finding total count for word form when many possible POS tags


I feel like I have a dumb question, but here goes anyway.. I'm trying to go from data that looks something like this:

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

to a result that looks like this:

the word form    total count    all possible POS and their individual counts 

So for example I could have:

ring     total count = 100        noun = 40, verb = 60

I have my data in a CSV file. I want to do something like this:

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

buuut I can't seem to figure out how to do that. Help?


Solution

  • If I understood correctly, the simplest way to achieve what you need would be:

    # Mocked CSV data
    data = [
     ['a', 'lemma', 'pos', 1],
     ['a', 'lemma', 'pos1', 2],
     ['a', 'lemma', 'pos2', 3],
     ['b', 'lemma', 'pos', 5],
    ]
    
    result = {}
    
    for row in data:
      key = row[0]
      count = row[3]
      if key in result:
        result[key] += count
      else:
        result[key] = count
    
    print(result)
    

    Result:

    {
      'a': 6,
      'b': 5
    }