I am trying to reproduce the function SUMIFS in Excel which would be approximately: accumulation1 =SUMIFS(value; $fin$1:$fin$5; ini$1)
what the formula does: searches for and adds up the values in the end list that correspond to an ini
Example calculation id3 and accumulation 1 : search or add the values or endPoint(ini = 11) i.e. value of id 1 and id 5 (3+5)=8
And then creating a new accumulation column and restart same computing (I have to do this 1004 times..)
id | ini | fin | value | accumulation1 | accumulation2 | sumOfAccumulation |
---|---|---|---|---|---|---|
1 | 10 | 11 | 5 | 0 | 0 | 5 |
2 | 9 | 10 | 0 | 0 | 0 | 0 |
3 | 11 | 12 | 2 | 8 | 0 | 10 |
4 | 12 | 13 | 1 | 2 | 8 | 11 |
5 | 05 | 11 | 3 | 0 | 0 | 3 |
I have the code for an accumulation that looks like this for now :
connection = psycopg2.connect(dbname=DB_NAME,user=DB_USER, password=DB_PWD, host=DB_HOST, port=DB_PORT)
cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
data = pdsql.read_sql_query("select id_bdcarth, id_nd_ini::int ini, id_nd_fin::int fin, v from tempturbi.tmp_somme_v19",connection)
Endtest=1
#loop until Endtest = 0 :
#create a new column accumulation
for i in data.ini:
acc=[]
acc=data.v.loc[data.fin==i] # get values of the upstream segments
acc=sum(acc)
#save acc in accumulation
Endtest=data.sum(accumulation)
print("--- %s seconds ---" % (time.time() - start_time))
and without saving the calculated results the script takes 129 seconds to run for only which is much slower than Excel. Is there any way to improve the script and make it faster?
what i'm trying to do is to walk along a rivernetwork and compute values :
Thanks again for clarifying your question. I think I understand now and this approach matches the output you've shown. Please let me know if I misunderstood, and also please let me know if this is faster than your approach. I don't know that it will be
import pandas as pd
#Create the test data
df = pd.DataFrame({
'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'ini': {0: 10, 1: 9, 2: 11, 3: 12, 4: 5},
'fin': {0: 11, 1: 10, 2: 12, 3: 13, 4: 11},
'value': {0: 5, 1: 0, 2: 2, 3: 1, 4: 3},
})
#Setup initial values
curr_value_col = 'value'
i = 0
all_value_cols = []
#The groupings stay the same throughout the loops
#so we can just group once and reuse it for speed benefit
gb = df.groupby('fin')
#Loop forever until we break
while True:
#update the loop number and add to the value col list
i += 1
all_value_cols.append(curr_value_col)
#group by fin and sum the value_col values
fin_cumsum = gb[curr_value_col].sum()
#map the sums to the new column
next_val_col = 'accumulation{}'.format(i)
df[next_val_col] = df['ini'].map(fin_cumsum).fillna(0).astype(int)
#If the new column we added sums to 0, then quit
#(I think this is what you were saying you wanted, but I'm not sure)
curr_value_col = next_val_col
if df[curr_value_col].sum() == 0:
break
#Get the cumulative sum from the list of columns we've been saving
df['sumOfAccumulation'] = df[all_value_cols].sum(axis=1)
df