Search code examples
pythonarrayspython-2.7numpyarcpy

How to randomly keep values to a specific numbers and replacing rest with no data in 2d numpy array without changing anything others


I am new in scientific computing. I have a 2D numpy array(say, A) with shape as (11153L, 4218L), datatype is dtype('uint8') .Now, I want to keep data at some(say, 10000) random positions (row,col) and fill the rest with no-data-value- How can I do this?

Here no-data-value is got from another environmental variable e.g.my_raster_nodata_values = dsc.noDataValue


Solution

  • You could use np.random.choice with the optional arg replace set as False to select unique indices for the total size of that array and set those in it as no_data_value. Thus, an implementation would be -

    a.ravel()[np.random.choice(a.size,a.size-10000,replace=0)] = no_data_value
    

    Alternatively, we can use np.put as to make it more intuitive, like so -

    np.put(a, np.random.choice(a.size,a.size-10000,replace=0), no_data_value)
    

    A sample run should make it easier to understand -

    In [94]: a     # Input array
    Out[94]: 
    array([[163,  80, 142, 169, 214],
           [  7,  59, 102, 104, 234],
           [ 44, 143,   7,  30, 232],
           [ 71,  15,  64,  42, 141]])
    
    In [95]: no_data_value = 0  # No value specifier
    
    In [98]: N = 10 # Number of elems to keep
    
    In [99]: a.ravel()[np.random.choice(a.size,a.size-N,replace=0)] = no_data_value
    
    In [100]: a
    Out[100]: 
    array([[  0,   0, 142,   0,   0],
           [  7,   0,   0, 104, 234],
           [  0,   0,   7,  30, 232],
           [ 71,   0,  64,   0, 141]])
    

    If you already have one or more elements in the input array that are equal to no_data_value, we might want to offset the number of elements to be set based on that count. So, for such a case, we would have a modified version, like so -

    S = a.size - N - (a == no_data_value).sum()
    idx = np.random.choice(np.flatnonzero(a!=no_data_value),S,replace=0)
    a.ravel()[idx] = no_data_value
    

    Sample run -

    In [65]: a
    Out[65]: 
    array([[240,  30,  61,  38, 145],
           [ 91,  65, 108, 154, 118],
           [155, 198,  65,  65, 189],
           [248, 140, 154, 186, 186]])
    
    In [66]: no_data_value = 65  # No value specifier
    
    In [67]: N = 10 # Number of elems to keep
    
    In [68]: S = a.size - N - (a == no_data_value).sum()
    
    In [69]: idx = np.random.choice(np.flatnonzero(a!=no_data_value),S,replace=0)
    
    In [70]: a.ravel()[idx] = no_data_value
    
    In [71]: a
    Out[71]: 
    array([[240,  30,  61,  38,  65],
           [ 65,  65, 108,  65,  65],
           [ 65, 198,  65,  65,  65],
           [248, 140, 154, 186,  65]])