Search code examples
pythonpython-3.xtextgroupingdefaultdict

Perform calculation in some items in a text file content and group the results in python


I am trying to perform some calculation and grouping of a text file content. I have already grouped the data but still have problems and also, I cannot figure out how to perform the calculation on the last field (amount column).

from collections import defaultdict
data=defaultdict(int)
with open('datafile.txt') as f:
    for line in f:
        group, score, team = line.split(maxsplit=2)
        data[(group.strip(),team.replace('\n','').strip())]+=int(score)
sorteddata = sorted([[k[0],v,k[1]] for k,v in data.items()], key=lambda x:x[1], reverse=True)

for subl in sorteddata:
    print(" ".join(map(str, subl)))

datafile.txt (item, trans, amout)

alpha 1 54,00.01
bravo 3 500,000.00
charlie 1 27,722.29 ($250.45)
charlie 10 252,336,733.383 ($492.06)
delta 2 11 ($10)
echo 5 143,299.00 ($101)
echo 8 145,300 ($125.01)
falcon 3 0.1234
falcon 5 9.19
lima 6 45.00181 ($38.9)
romeo 12 980

Wanted Output: #-- sorted, grouped and calculated

echo    13  288,599.00 ($226.01)      #-- grouped and calculated
romeo   12  980
charlie 11  252,364,455.673 ($742.51) #-- grouped and calculated
falcon  8   9.3134                    #-- grouped and calculated
lima    6   45.00181 ($38.9)
bravo   3   500,000.00
delta   2   11 ($10)
alpha   1   54,00.01

Current Output: #-- partially correct

romeo 12 980
charlie 10 252,336,733.383 ($492.06)
echo 8 145,300 ($125.01)
lima 6 45.00181 ($38.9)
echo 5 143,299.00 ($101)
falcon 5 9.19
bravo 3 500,000.00
falcon 3 0.1234
delta 2 11 ($10)
alpha 1 54,00.01
charlie 1 27,722.29 ($250.45)

Solution

  • There's a bunch of issues here, but first and foremost, you're looking to add three separate values for each team, some of which are apparently optional in your code. Just splitting, grouping, adding and hoping for the best isn't going to cut it.

    When coding, always think through the entire problem and try to visualise / imagine the process in your mind as you're coding it. Each line of code should correspond to something concrete you want to happen. No 'taking a stab at it' in coding.

    It looks like you tried to cover for the optional last column by setting a maxsplit, but you still need to deal with that last field, since you expect it in the output.

    There's also no way that Python just automatically deals with something like ($38.9) and understands it to be a numerical value - you'll have to tell it.

    I assumed input like 54,00.01 is a typo and was supposed to read 54,000.01 since you're using normal English numbers?

    Here's a version of your program somewhat close to what you wrote that works:

    data = {}
    
    with open('datafile.txt') as f:
        for line in f:
            parts = line.split()
            team, a, b, c = parts if len(parts) == 4 else parts + ['($0)']
            data[team] = tuple(map(sum, zip((int(a), float(b.replace(',', '')), float(c[2:-1].replace(',', ''))), data.get(team, (0, 0, 0)))))
    
    data = {t: (a, b, c) for a, b, c, t in reversed(sorted((a, b, c, t) for t, (a, b, c) in data.items()))}
    
    for team, (a, b, c) in data.items():
        print(f'{team:8} {a:4} {b:,} (${c:,})')
    

    Some more changes were required, and you can see that values for c that end up being 0 are still printed, but fixing that is left up to the reader. Result:

    echo       13 288,599.0 ($226.01)
    romeo      12 980.0 ($0.0)
    charlie    11 252,364,455.67299998 ($742.51)
    falcon      8 9.3134 ($0.0)
    lima        6 45.00181 ($38.9)
    bravo       3 500,000.0 ($0.0)
    delta       2 11.0 ($10.0)
    alpha       1 54,000.01 ($0.0)
    

    A few notes on the solution: the adding is done with this line:

    data[team] = tuple(map(sum, zip((int(a), float(b.replace(',', '')), float(c[2:-1].replace(',', ''))), data.get(team, (0, 0, 0)))))
    

    That works by getting an existing tuple, or (0, 0, 0) if it doesn't exist (instead of using the defaultdict). It the zips that tuple with a tuple of the values on the current line, which are converted from their string form into their numerical forms as needed. Then the sum function is mapped onto the pairs of values from both tuples (to add them) and finally, the result it turned into tuple again.

    You seem to be sorting by 'a', but I assumed that if two teams had the same value there, you'd probably want to sort by the two figures and all things equal by the team name:

    data = {t: (a, b, c) for a, b, c, t in reversed(sorted((a, b, c, t) for t, (a, b, c) in data.items()))}
    

    This works by taking the generated dictionary, turning it into tuples, sorting those, reversing the result and then turning that into a dictionary again (although you could just take the list of tuples as well and print those of course)

    The printing in a nice format is clean:

        print(f'{team:8} {a:4} {b:,} (${c:,})')
    

    This is a so-called f-string, that takes care of formatting the floats with commas, as well as setting a specific size for the columns as in your example output.