Year, TC_Number, Maximum wind speeds
data = pd.DataFrame({
'year': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2],
'TC_number': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
'maximum_wind_speed': [20.37199783, 21.2, 21.7, 14.626, 18.108, 21.4, 25.3, 25.3, 22.9, 18.108, 20.2, 22.1, 24.3, 25.5, 27.7, 29.8, 33.6, 36.7, 36.6, 35, 33, 29.7, 29, 20]})
Hi All,
I've tried to find solutions by searching online, but, none seem to be what I am looking for.
I know what I want to do, but I am getting stuck on how to implement the code.
I first initialize a (1000, 240) array. I then want to create a loop that fills in each row of the array. Each row represents a single Tropical Cyclone (TC)'s recorded maximum wind speed values and 240 represents the maximum number of values that a TC could have. However, each TC will have varying number of values recorded in the maximum wind speeds row. I want the loop to jump to the next row when the current TC number does not equal the previous TC number.
This is what I have so far:
output_array = np.full((1000, 240), np.nan)
#Shape of vmaxsyn is (337079,)
for i in range(1000):
#print("i = ", i)
for j in range(241):
#print("j = ", j)
name_id1 = df.iloc[j]['TC_number']
name_id2 = df.iloc[j-1]['TC_number']
if name_id1 == name_id2:
output_array[i, j] = vmaxsyn[j]
#print(output_array[j,i])
#print([i,j])
else:
#print("breaking out of inner loop")
break
#print("breaking out of outer loop.")
I was expecting something like this:
data = [
[20.372, 21.2, 21.7, 14.62, np.nan, np.nan],
[18.108, 21.4, 25.3, 25.3, 22.9, np.nan],
[18.108, 20.2, 22.1, 24.3, np.nan, np.nan],
[25.5, 27.1, 29.8, 33.6, np.nan, np.nan],
[36.7, 36.6, 35, np.nan, np.nan, np.nan],
[33, 29.7, 29, np.nan, np.nan, np.nan]]
The problem is none of the vmaxsyn values are being recorded to my output array. And I am also trying to deal with a broadcast error with my other approach. Any help is greatly appreciated. I'm specifically trying to accomplish this with pandas.
You don't need a for
loop here at all. First, append one id
column to your data which increments when TC_number
changes. Then group your data by this newly created id
and use pandas.DataFrame.apply
for converting it in a list.
data['tc_id'] = data['TC_number'].ne(data['TC_number'].shift()).cumsum()-1
array = data.groupby('tc_id')['maximum_wind_speed'].apply(list)
The result will look like
print(array)
tc_id
0 [20.37199783, 21.2, 21.7, 14.626]
1 [18.108, 21.4, 25.3, 25.3, 22.9]
2 [18.108, 20.2, 22.1, 24.3]
3 [25.5, 27.7, 29.8, 33.6]
4 [36.7, 36.6, 35.0]
5 [33.0, 29.7, 29.0, 20.0]