Numpy signal processing: Efficiently summarize subarrays of consecutive nonzero elements

so, I'm working with FFT data that when thoroughly cleaned should look like many many zeros and an occasional large number. What I have at the moment is many zeros and an occasional short subarray of largish numbers. as an example,

ydata=np.array([0,0,0,0,1,2,3,0,0,9,3, 1, 0, 2, 9, 0]) xdata=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

I'm using an array of consecutive integers for xdata, but these values could be any sequence of strictly increasing numbers.

What I would like to do is set all current y values to 0 and add values corresponding to the non-zero subarrays. for each subarray, the y value should be the sum of the elements in the sub array and it should the index closest to the weighted sum of the indices of the subarray.

as an example, the first subarray of consecutive nonzero elements in ydata is [1,2,3] with corresponding x values [4,5,6]. the sum of the y values is 6, the weighted sum of the x values is (4+10+18)/6 which rounds to 5. Following this pattern I'd like to get.

ydata=np.array([0,0,0,0,0,6,0,0,0,13,0, 0, 0, 0, 11, 0]) xdata=np.array([0,1,2,3,4,5,6,7,8,9, 10,11,12,13,14,15])

if instead we have ydata=np.array([1,2,3,0,1,2,3,0,0,9,3, 1, 0, 2, 9, 0]) xdata=np.array([.1,.4,.6,3,4.1,4.2,4.3,7,8,9,10,11,12,13,14,15])

then ydata should be ydata=np.array([6,0,0,0,6,0,0,0,0,13,0, 0, 0, 0, 11, 0])

I could do this using non-numpy iteration, but that would be impractical with data the size I'm working with. would anyone know of some good way to do this through numpythonic means?

Solution

You need to calculate a few things here:

First, get the cumulative sum of the entire array:

cs = np.cumsum(ydata)

Next, find locations where the current element is nonzero, the next element is zero, and the cumulative sum is not zero:

filter_locs = (ydata != 0) & (cs != 0)
filter_locs[:-1] = filter_locs[:-1] & (ydata[1:] == 0)

The new values can be derived from the cumulative sum at these locations:

new_values = cs[filter_locs]
new_values[1:] -= new_values[:-1]

We can do a similar operation on the xdata * ydata to get the indices:

cs_x = np.cumsum(xdata * ydata)
weighted_totals = cs_x[filter_locs]
weighted_totals[1:] -= weighted_totals[:-1]

new_indices = np.round(weighted_totals / new_values).astype(int)

And finally, set the indices in an array of zeros:

new_data = np.zeros_like(ydata)
new_data[new_indices] = new_values

Which gives us the desired array:

array([ 0,  0,  0,  0,  0,  6,  0,  0,  0, 13,  0,  0,  0,  0, 11,  0])

If xdata is not an array of indices, then the maximum possible index in the output array is the maximum value of xdata, so instead of new_data = np.zeros_like(ydata), do:

max_possible_index = int(np.round(xdata.max()))
new_data = np.zeros((max_possible_index+1,))