I'm working with a multidimensional data array where I have various data points for individuals. I created a nested loop that allows me to make metric calculations throughout the entire dataset, however, once rearranging it I loose data points. From my initial 253 individuals, I end up with the calculated metrics for 182. The code works, but I don't know at which moment I'm letting data out.
data_array -- containing 253 individuals, each with several subcategories
mos0_ids=[]
mos0_dt = []
mos0_x_dpos = []
mos0_y_dpos = []
mos0_z_dpos = []
for i in range (0,252):
mos0=data_array[i]
mos0_id= mos0[0][0]
mos0_time=mos0[:,1]
mos0_x_pos=mos0[:,2]
mos0_y_pos=mos0[:,3]
mos0_z_pos=mos0[:,4]
mos0_speed=mos0[:,6]
for j in range(0,len(mos0_id)):
mos0_ids.append(mos0_id)
for k in range(0,len(mos0_time)):
first_mov_time=mos0_time[k]
last_mov_time=mos0_time[k-1]
first_movement = dt.datetime.strptime(first_mov_time, '%Y-%m-%d %H:%M:%S.%f')
last_movement = dt.datetime.strptime(last_mov_time, '%Y-%m-%d %H:%M:%S.%f')
x = first_movement - last_movement
total_seconds = x.total_seconds()
mos0_dt.append(total_seconds)
for l in range(0,len(mos0_x_pos)):
first_mov_pos=mos0_x_pos[l]
last_mov_pos=mos0_x_pos[l-1]
x = first_mov_pos - last_mov_pos
mos0_x_dpos.append(x)
for m in range(0,len(mos0_y_pos)):
first_mov_pos=mos0_y_pos[m]
last_mov_pos=mos0_y_pos[m-1]
x = first_mov_pos - last_mov_pos
mos0_y_dpos.append(x)
for n in range(0,len(mos0_z_pos)):
first_mov_pos=mos0_z_pos[n]
last_mov_pos=mos0_z_pos[n-1]
x = first_mov_pos - last_mov_pos
mos0_z_dpos.append(x)
mos0_ids
mos0_dt
mos0_x_dpos
mos0_y_dpos
mos0_z_dpos
time_pos=list(zip(mos0_ids, mos0_dt, mos0_x_dpos, mos0_y_dpos, mos0_z_dpos))
time_pos=pd.DataFrame(time_pos,columns=['mos_id','dtime', 'x_position', 'y_position','z_position']) # transform into a dataframe
time_pos['x_velocity'] = time_pos['x_position']/time_pos['dtime']
time_pos['y_velocity'] = time_pos['y_position']/time_pos['dtime']
time_pos['z_velocity'] = time_pos['z_position']/time_pos['dtime']
time_pos['x_acceleration'] = time_pos['x_velocity']/time_pos['dtime']
time_pos['y_acceleration'] = time_pos['y_velocity']/time_pos['dtime']
time_pos['z_acceleration'] = time_pos['z_velocity']/time_pos['dtime']
time_pos=time_pos.groupby('mos_id')
time_pos = np.array(time_pos, dtype=object)
time_pos
EDIT:
I re arranged the code as to including for i in range (0,253)
and including an indent as following:
for i in range (0,253):
mos0=swarm_data_array[i]
mos0_id= mos0[0][0]
mos0_time=mos0[:,1]
mos0_x_pos=mos0[:,2]
mos0_y_pos=mos0[:,3]
mos0_z_pos=mos0[:,4]
mos0_speed=mos0[:,6]
for j in range(len(mos0_id)):
mos0_ids.append(mos0_id)
for k in range(len(mos0_time)):
first_mov_time=mos0_time[k]
last_mov_time=mos0_time[k-1]
first_movement = dt.datetime.strptime(first_mov_time, '%Y-%m-%d %H:%M:%S.%f')
last_movement = dt.datetime.strptime(last_mov_time, '%Y-%m-%d %H:%M:%S.%f')
x = first_movement - last_movement
total_seconds = x.total_seconds()
mos0_dt.append(total_seconds)
for l in range(len(mos0_x_pos)):
first_mov_pos=mos0_x_pos[l]
last_mov_pos=mos0_x_pos[l-1]
x = first_mov_pos - last_mov_pos
mos0_x_dpos.append(x)
for m in range(len(mos0_y_pos)):
first_mov_pos=mos0_y_pos[m]
last_mov_pos=mos0_y_pos[m-1]
x = first_mov_pos - last_mov_pos
mos0_y_dpos.append(x)
for n in range(len(mos0_z_pos)):
first_mov_pos=mos0_z_pos[n]
last_mov_pos=mos0_z_pos[n-1]
x = first_mov_pos - last_mov_pos
mos0_z_dpos.append(x)
mos0_ids
mos0_dt
mos0_x_dpos
mos0_y_dpos
mos0_z_dpos
time_pos=list(zip(mos0_ids, mos0_dt, mos0_x_dpos, mos0_y_dpos, mos0_z_dpos))
time_pos=pd.DataFrame(time_pos,columns=['mos_id','dtime', 'x_position', 'y_position','z_position']) # transform into a dataframe
time_pos['x_velocity'] = time_pos['x_position']/time_pos['dtime']
time_pos['y_velocity'] = time_pos['y_position']/time_pos['dtime']
time_pos['z_velocity'] = time_pos['z_position']/time_pos['dtime']
time_pos['x_acceleration'] = time_pos['x_velocity']/time_pos['dtime']
time_pos['y_acceleration'] = time_pos['y_velocity']/time_pos['dtime']
time_pos['z_acceleration'] = time_pos['z_velocity']/time_pos['dtime']
time_pos=time_pos.groupby('mos_id')
The issue now is that after I organize my data using GroupBy and I apply a .describe() function, I'm left with a constant 26 count per group and this is not correct. There are groups larger than others. Could this be an error in any part of the nested loop?
You probably miss one "specific" behavior of range().
Your first very simplified loop will have only 252
values, instead of having 253
Try this out in console:
len(range(0,252))
-> 252
So I presume as it's nested arr (matrices), it looses lots of data according to several calculations it should make for every col/row.
Solution:
for i in range(0, 253)
or for i in range(len(data_array) + 1)
I assume same happened to all of your provided for
loops