Search code examples
pythonpython-3.xtextgrouping

Grouping and computing sorted data items in a text file in python


The snippet below sorts data from data.txt and I have been trying to improved the script by grouping the same data based on group and perform addition of the same data groups.

import re
data = []
with open('data.txt') as f:
    for line in f:
        group, score, team = re.split(r' (-?\d*\.?\d+) ', line.strip('\n').strip())
        data.append((int(score), group.strip(), team.strip()))

data.sort(reverse=True)
print("Top Scores:")
for (score, group, team), _ in zip(data, range(100)):
    print(f'{group} - {score} - {team}')  

Source File (data.txt):

alpha 1 dream team
bravo 3 never mind us
charlie 1 diehard  
delta 2 just cool
echo 5 dont do it
falcon 3 your team
lima 6 allofme
charlie 10 diehard
romeo 12 justnow
echo 8 dont do it

Current Output:

Top Scores:
romeo - 12 - justnow
charlie - 10 - diehard
echo - 8 - dont do it
lima - 6 - allofme
echo - 5 - dont do it
falcon - 3 - your team
bravo - 3 - never mind us
delta - 2 - just cool
charlie - 1 - diehard
alpha - 1 - dream team

Wanted Output: #-- grouped and totalled

echo 13 dont do it   #-- totalled since repeating
romeo 12 justnow
charlie 11 diehard   #-- totalled since repeating
lima 6 allofme
bravo 3 never mind us
falcon 3 your team
delta 2 just cool
alpha 1 dream team

Solution

  • Use a dictionary for grouping (a defaultdict in this case), then revert to a list for sorting (and btw, you don't need a regex for a simple splitting like this one):

    data=defaultdict(int)
    with open('data.txt') as f:
        for line in f:
            group, score, team = line.split(maxsplit=2)
            data[(group.strip(),team.replace('\n','').strip())]+=int(score)
    sorteddata = sorted([[k[0],v,k[1]] for k,v in data.items()], key=lambda x:x[1], reverse=True)
    
    >>> sorteddata
    [['echo', 13, 'dont do it'], ['romeo', 12, 'justnow'], ['charlie', 11, 'diehard'], ['lima', 6, 'allofme'], ['bravo', 3, 'never mind us'], ['falcon', 3, 'your team'], ['delta', 2, 'just cool'], ['alpha', 1, 'dream team']]