Search code examples
listpython-3.xnumpyenumeratepad

How can one pad zeroes in-between non-consecutive float elements of a sorted list that contains duplicates?


I have a list of float values that represent the time of an observation. (Each float value can actually be represented as an integer, but I am hoping to generalize for possible future circumstances).

list_hrs = [4,6,8,8,10] # actual list is thousands of floats

I am trying to pad the values that don't match their indices with zero while counting only a single occurrence of duplicate entries. Per the example list, I would want

list_hrs = [0,0,0,0,4,0,6,0,8,8,0,10]

The first four entries are 0 because there are inclusively four numbers from 0 to 3. The 0 between 4 and 6 is wanted there because 5 is missing; similarly for the 0 between 6 and 8. The 0 between 8 and 10 is wanted there because the value 9 is missing. Also, the duplicate 8's are left untouched, as they will be dealt with later in my code; only a single occurrence of the duplicate 8's should be counted before padding 0's.

My first attempt was to try this:

for index in range(len(list_hrs)):
    if list_hrs != index:
        list_hrs.insert(index, 0)

>> [0, 0, 0, 0, 0, 4, 6, 8, 8, 10]

I then read different SO posts and came away with the impression that it's best to first make a list of 0's, for which the length should be equal to the number of data points considered. Then, the non-zero entries can replace the 0 entries. So, I tried the following:

def make_zeros(hrs=list_hrs): # make list of 0's
    num_zer = int(max(hrs))
    list_zer = [0 for index in range(num_zer+1)]
    return list_zer

But I am unsure of how to implement the condition to achieve the desired result after this point. I am thinking there is a way to use enumerate to check if the index matches the value at that index, but am unsure how to proceed due to duplicate entries (such as the 8's in the example above).

Is this method a good direction to keep going in, or is there a more efficient / simpler way to achieve the desired result? Any help or advice would be appreciated.


Solution

  • Here's one vectorized approach -

    def make_zeros_vectorized(A, dtype=float):
        a = np.asarray(A).astype(int)
        idx = a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
        out = np.zeros(idx[-1]+1,dtype=dtype)
        out[idx] = A
        return out
    

    Sample runs -

    In [95]: A
    Out[95]: [4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
    
    In [96]: make_zeros_vectorized(A)
    Out[96]: 
    array([  0.,   0.,   0.,   0.,   4.,   0.,   6.,   0.,   8.,   8.,   0.,
            10.,  10.,  10.,   0.,   0.,   0.,  14.,   0.,  16.])
    
    In [100]: A
    Out[100]: [4.0, 4.0, 4.0, 4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
    
    In [101]: make_zeros_vectorized(A)
    Out[101]: 
    array([  0.,   0.,   0.,   0.,   4.,   4.,   4.,   4.,   0.,   6.,   0.,
             8.,   8.,   0.,  10.,  10.,  10.,   0.,   0.,   0.,  14.,   0.,
            16.])
    

    Steps involved

    Input list

    In [71]: A = [4.0,6.0,8.0,8.0,10.0,10.0,10.0,14.0,16.0]
    

    Convert to array

    In [72]: a = np.asarray(A).astype(int)
    
    In [73]: a
    Out[73]: array([ 4,  6,  8,  8, 10, 10, 10, 14, 16])
    

    Create a mask of duplicates. The is central to this approach, as we plan to use cumulative summation later on. With the duplicates being represented as True, when cumulatively summed would result in incremental values, to be used as incremental indices for placing the input array values into the output array

    In [74]: a[1:] == a[:-1]
    Out[74]: array([False, False,  True, False,  True,  True, False, False], dtype=bool)
    
    In [75]: (a[1:] == a[:-1]).cumsum()
    Out[75]: array([0, 0, 1, 1, 2, 3, 3, 3])
    

    Append a zero at the start, as the earlier "a[1:] == a[:-1]" would have resulted in one-element less array

    In [76]: np.r_[0, (a[1:] == a[:-1]).cumsum()]
    Out[76]: array([0, 0, 0, 1, 1, 2, 3, 3, 3])
    

    Finally, add to the input array so that the duplicates are shifted/added one-up and thus giving us the indices at which output array are to be assigned

    In [77]: a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
    Out[77]: array([ 4,  6,  8,  9, 11, 12, 13, 17, 19])
    

    Later steps are basically creating an output array and assigning values from a into it using indices obtained earlier.


    If you need the mask of zeros or those indices, here's a modified version -

    def get_zeros_mask(A):
        a = np.asarray(A).astype(int)
        idx = a + np.r_[0, (a[1:] == a[:-1]).cumsum()]
        mask = np.ones(idx[-1]+1,dtype=bool)
        mask[idx] = 0
        return mask
    

    Sample run -

    In [93]: A
    Out[93]: [4.0, 6.0, 8.0, 8.0, 10.0, 10.0, 10.0, 14.0, 16.0]
    
    In [94]: make_zeros_vectorized(A)
    Out[94]: 
    array([  0.,   0.,   0.,   0.,   4.,   0.,   6.,   0.,   8.,   8.,   0.,
            10.,  10.,  10.,   0.,   0.,   0.,  14.,   0.,  16.])
    
    In [95]: get_zeros_mask(A)
    Out[95]: 
    array([ True,  True,  True,  True, False,  True, False,  True, False,
           False,  True, False, False, False,  True,  True,  True, False,
            True, False], dtype=bool)
    
    In [96]: np.flatnonzero(get_zeros_mask(A))
    Out[96]: array([ 0,  1,  2,  3,  5,  7, 10, 14, 15, 16, 18])