I have a list of float values that represent the time of an observation. (Each float value can actually be represented as an integer, but I am hoping to generalize for possible future circumstances).
list_hrs = [4,6,8,8,10] # actual list is thousands of floats
I am trying to pad the values that don't match their indices with zero while counting only a single occurrence of duplicate entries. Per the example list, I would want
list_hrs = [0,0,0,0,4,0,6,0,8,8,0,10]
The first four entries are 0
because there are inclusively four numbers from 0
to 3
. The 0
between 4
and 6
is wanted there because 5
is missing; similarly for the 0
between 6
and 8
. The 0
between 8
and 10
is wanted there because the value 9
is missing. Also, the duplicate 8
's are left untouched, as they will be dealt with later in my code; only a single occurrence of the duplicate 8
's should be counted before padding 0
's.
My first attempt was to try this:
for index in range(len(list_hrs)):
if list_hrs != index:
list_hrs.insert(index, 0)
>> [0, 0, 0, 0, 0, 4, 6, 8, 8, 10]
I then read different SO posts and came away with the impression that it's best to first make a list of 0
's, for which the length should be equal to the number of data points considered. Then, the non-zero entries can replace the 0
entries. So, I tried the following:
def make_zeros(hrs=list_hrs): # make list of 0's
num_zer = int(max(hrs))
list_zer = [0 for index in range(num_zer+1)]
return list_zer
But I am unsure of how to implement the condition to achieve the desired result after this point. I am thinking there is a way to use enumerate
to check if the index matches the value at that index, but am unsure how to proceed due to duplicate entries (such as the 8
's in the example above).
Is this method a good direction to keep going in, or is there a more efficient / simpler way to achieve the desired result? Any help or advice would be appreciated.
Here's one vectorized approach -
def make_zeros_vectorized(A, dtype=float):
a = np.asarray(A).astype(int)
idx = a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
out = np.zeros(idx[-1]+1,dtype=dtype)
out[idx] = A
return out
Sample runs -
In [95]: A
Out[95]: [4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
In [96]: make_zeros_vectorized(A)
Out[96]:
array([ 0., 0., 0., 0., 4., 0., 6., 0., 8., 8., 0.,
10., 10., 10., 0., 0., 0., 14., 0., 16.])
In [100]: A
Out[100]: [4.0, 4.0, 4.0, 4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
In [101]: make_zeros_vectorized(A)
Out[101]:
array([ 0., 0., 0., 0., 4., 4., 4., 4., 0., 6., 0.,
8., 8., 0., 10., 10., 10., 0., 0., 0., 14., 0.,
16.])
Steps involved
Input list
In [71]: A = [4.0,6.0,8.0,8.0,10.0,10.0,10.0,14.0,16.0]
Convert to array
In [72]: a = np.asarray(A).astype(int)
In [73]: a
Out[73]: array([ 4, 6, 8, 8, 10, 10, 10, 14, 16])
Create a mask of duplicates. The is central to this approach, as we plan to use cumulative summation later on. With the duplicates being represented as True, when cumulatively summed would result in incremental values, to be used as incremental indices for placing the input array values into the output array
In [74]: a[1:] == a[:-1]
Out[74]: array([False, False, True, False, True, True, False, False], dtype=bool)
In [75]: (a[1:] == a[:-1]).cumsum()
Out[75]: array([0, 0, 1, 1, 2, 3, 3, 3])
Append a zero at the start, as the earlier "a[1:] == a[:-1]" would have resulted in one-element less array
In [76]: np.r_[0, (a[1:] == a[:-1]).cumsum()]
Out[76]: array([0, 0, 0, 1, 1, 2, 3, 3, 3])
Finally, add to the input array so that the duplicates are shifted/added one-up and thus giving us the indices at which output array are to be assigned
In [77]: a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
Out[77]: array([ 4, 6, 8, 9, 11, 12, 13, 17, 19])
Later steps are basically creating an output array and assigning values from a
into it using indices obtained earlier.
If you need the mask of zeros or those indices, here's a modified version -
def get_zeros_mask(A):
a = np.asarray(A).astype(int)
idx = a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
mask = np.ones(idx[-1]+1,dtype=bool)
mask[idx] = 0
return mask
Sample run -
In [93]: A
Out[93]: [4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
In [94]: make_zeros_vectorized(A)
Out[94]:
array([ 0., 0., 0., 0., 4., 0., 6., 0., 8., 8., 0.,
10., 10., 10., 0., 0., 0., 14., 0., 16.])
In [95]: get_zeros_mask(A)
Out[95]:
array([ True, True, True, True, False, True, False, True, False,
False, True, False, False, False, True, True, True, False,
True, False], dtype=bool)
In [96]: np.flatnonzero(get_zeros_mask(A))
Out[96]: array([ 0, 1, 2, 3, 5, 7, 10, 14, 15, 16, 18])