I'm very new to Python (and very rusty on scripting, my last trials were bash and Perl in 2001 or so), have tried searching SO already -- but honestly didn't even know what to look for. I'm fairly sure it's trivial - and I'm a bit ashamed.
I have a fairly large CSV file (approx 26k rows), in a tab-separated format:
name, current_value, current_pct, change_pct
ItemA 2452434324 7,70% -1,19
ItemB 342331 2,40% -0,45
ItemC 32412123 3,90% 3,87
ItemD 0 0 -4,52
ItemE 12318231 14,80% 0
ItemA 542312134 1,60% 0,11
ItemC 2423423425 11,21% -0,01
ItemE 3141888103 30,00% 0
ItemB 78826 1,01% 12,01
ItemA 89937 0,04% 0
...
There's total of approx 300 "Items" (repeating, but in various order, and sometimes only appearing once-or-twice), each of them has a "current value" (integer, from 0 to approx 1000 million (or 1 billion/milliard)), current percentage value (not interesting to me at the moment), and percentage change from last reading (different file, not interesting to me at the moment).
What I want to achieve is a sum of 'change_pct
' column for each Item
, so for aforementioned example result would be:
name total_pct_change
ItemA -1,08
ItemB 11,56
ItemC 3,86
ItemD -4,52
ItemE 0
I was planning to create a list of items
then just sum the values of row[3]
for these, but I fail miserably.
What I have until now:
import csv, sys, string
xlsfile = sys.argv[1]
with open(xlsfile, 'rb') as f:
reader = csv.reader(f, delimiter='\t')
item = row[0]
pct_change = row[3]
# this is where I draw a blank
# was thinking of something akin to
# foreach item do sum(pct_change)
# but that's obviously wrong
print item, sum_pct_change
f.close()
A pandas solution that works:
import pandas as pd
with open(xlsfile) as fobj:
header = [entry.strip() for entry in next(fobj).split(',')]
data = pd.read_csv(xlsfile, delim_whitespace=True, decimal=',', names=header, skiprows=1)
summed = data.groupby(by=['name'])['change_pct'].sum()
print(summed)
Output:
name
ItemA -1.08
ItemB 11.56
ItemC 3.86
ItemD -4.52
ItemE 0.00
Name: change_pct, dtype: float64
EDIT
If your file is ;
-separated. This should work:
data = pd.read_csv('pct2.csv', sep=';', decimal=',')
summed = data.groupby(by=['name'])['change_pct'].sum()
print(summed)