Goal: Here is a sample of a dataset that has "ID", "PHASENAME", "CDAYS", "MULTI_FACTOR", "DAY_COUNTER", and "DAILY_LABOR_PERCENT". The goal is to take every "ID", "PHASENAME", "CDAYS" and increase the "DAY_COUNTER" from whatever the last day is increase it to 100 days. I am also to repeat this for "DAILY_LABOR_PERCENT".
currently
ID | PHASENAME | C_DAYS | Multi_Factor | DAY_COUNTER | DAILY_LABOR_% |
---|---|---|---|---|---|
BAR | C | 14.0 | 7.142857 | 1 | 1.0 |
BAR | C | 14.0 | 7.142857 | 2 | 5.0 |
BAR | C | 14.0 | 7.142857 | 13 | 4.0 |
BAR | C | 14.0 | 7.142857 | 14 | 0.0 |
need it to be
ID | PHASENAME | C_DAYS | Multi_Factor | DAY_COUNTER | DAILY_LABOR_% |
---|---|---|---|---|---|
BAR | C | 14.0 | 7.142857 | 1 | .14 |
BAR | C | 14.0 | 7.142857 | 2 | some number |
BAR | C | 14.0 | 7.142857 | 99 | some number |
BAR | C | 14.0 | 7.142857 | 100 | 0.0 |
In this sample dataset, this person worked for 14 days and their labor percentage is recorded for each day. Instead of have 14 days of labor, I want to have 100 days of labor ( formulas: 'DAY_COUNTER' * 'Multi_Factor' and "DAILY_LABOR_PERCENT" / 'Multi_Factor ).
Also, I'm worried that once I enlarge this dataset having the range as 100 won't work.
Looking for
code
import pandas as pd
import numpy as np
data={
"ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
"PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
"C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
"Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
"DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
"DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
}
df=pd.DataFrame(data)
df1=df.copy()
n = 100
first_day = df["DAY_COUNTER"].iloc[0]
rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]
out["DAY_COUNTER"] = range(first_day, n + first_day)
def labor_factor(row):
return row['DAILY_LABOR_PERCENT'] / row['Multi_Factor']
#creating empty list to store rows for each id, phasename, and cdays
new_rows= []
#iterating through each index range and applying function to get day_counter to 100
for i in range(len(out)):
out['new_labor']= out.apply(labor_factor,axis=1)
As a general rule, you want to avoid iterating over dataframe rows, as that's very inefficient.
IIUC, you can instead repeat your dataframe using something vectorized like numpy.tile
until you have the desired number of rows, then adjust the values accordingly:
n = 100
first_day = df["DAY_COUNTER"].iloc[0]
rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]
out["DAY_COUNTER"] = range(first_day, n + first_day)
ID PHASENAME C_DAYS Multi_Factor DAY_COUNTER DAILY_LABOR_PERCENT
0 BAR C 14.0 7.142857 1 1.0
1 BAR C 14.0 7.142857 2 5.0
2 BAR C 14.0 7.142857 3 9.0
3 BAR C 14.0 7.142857 4 11.0
4 BAR C 14.0 7.142857 5 10.0
.. ... ... ... ... ... ...
95 BAR C 14.0 7.142857 96 7.0
96 BAR C 14.0 7.142857 97 4.0
97 BAR C 14.0 7.142857 98 0.0
98 BAR C 14.0 7.142857 99 1.0
99 BAR C 14.0 7.142857 100 5.0