Search code examples
pythonnumpylist-comprehensionhpc

How to compare two lists and extract position, index and neighbors?


Let's assume we have two lists:

list1 = [1, 2, 3, 4, 5]
list2 = [6, 7, 8, 9, 10]

That's the basic structure:

     Columns
Rows    0      1      2      3      4
0       1      2      3      4      5
1       6      7      8      9      10

Each of the elements should print a line with a bit of metadata (index, position and neighbors):

row/col   Print statement
0/0       "Row index=0, Column Index=0, Value=1, Value below=6, Value to the right = 2"
0/1       "Row index=0, Column Index=1, Value=2, Value below=7, Value to the right = 3"
0/2       "Row index=0, Column Index=2, Value=3, Value below=8, Value to the right = 4"
0/3       "Row index=0, Column Index=3, Value=4, Value below=9, Value to the right = 5"
0/4       "Row index=0, Column Index=4, Value=5, Value below=10, Value to the right = NaN"
1/0       "Row index=1, Column Index=0, Value=6, Value below=NaN, Value to the right = 7"
1/1       "Row index=1, Column Index=1, Value=7, Value below=NaN, Value to the right = 8"
1/2       "Row index=1, Column Index=2, Value=8, Value below=NaN, Value to the right = 9"
1/3       "Row index=1, Column Index=3, Value=9, Value below=NaN, Value to the right = 10"
1/4       "Row index=1, Column Index=4, Value=10, Value below=NaN, Value to the right = NaN"

Is there something like list comprehension or any other way to compare these two lists as quickly as possible?

I wouldn't want to use for/while loops as they are considered to be very slow.

EDIT: The solution will be the core of a big data function that has to handle millions of comparisons. Using traditional for loops would extremely slow down my function. That's why I am looking for a faster way to do this.


Solution

  • If you want a list of strings, then you have to iterate - over each element. String formatting like that works with scalar elements, not whole arrays.

    It is possible to generate the values using whole array operations. But the result will be some form of array, not your list of strings.

    For example:

    In [198]: list1 = [1, 2, 3, 4, 5] 
         ...: list2 = [6, 7, 8, 9, 10]                                                             
    In [199]: arr = np.array([list1,list2])                                                        
    In [200]: arr                                                                                  
    Out[200]: 
    array([[ 1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10]])
    

    An easy way to iterate on the array, while also displaying the indices is:

    In [201]: list(np.ndenumerate(arr))                                                            
    Out[201]: 
    [((0, 0), 1),
     ((0, 1), 2),
     ((0, 2), 3),
     ((0, 3), 4),
     ((0, 4), 5),
     ((1, 0), 6),
     ((1, 1), 7),
     ((1, 2), 8),
     ((1, 3), 9),
     ((1, 4), 10)]
    

    This is still an iteration. We can get the indices in array form with:

    In [215]: np.indices(arr.shape)                                                                
    Out[215]: 
    array([[[0, 0, 0, 0, 0],
            [1, 1, 1, 1, 1]],
    
           [[0, 1, 2, 3, 4],
            [0, 1, 2, 3, 4]]])
    In [216]: I,J = np.indices(arr.shape)
    

    And stacking those indices with a values:

    In [218]: np.stack((I.ravel(),J.ravel(),arr.ravel()),axis=1)                                   
    Out[218]: 
    array([[ 0,  0,  1],
           [ 0,  1,  2],
           [ 0,  2,  3],
           [ 0,  3,  4],
           [ 0,  4,  5],
           [ 1,  0,  6],
           [ 1,  1,  7],
           [ 1,  2,  8],
           [ 1,  3,  9],
           [ 1,  4, 10]])
    

    To get the below and right values we can generate a padded array:

    In [223]: arr1 = np.pad(arr.astype(float),[(0,1),(0,1)],mode='constant',constant_values=np.nan)
         ...:                                                                                      
    In [224]: arr1                                                                                 
    Out[224]: 
    array([[ 1.,  2.,  3.,  4.,  5., nan],
           [ 6.,  7.,  8.,  9., 10., nan],
           [nan, nan, nan, nan, nan, nan]])
    

    Note that I had to convert the array to float to accept the float np.nan value.

    And combining everything:

    In [225]: np.stack((I.ravel(),J.ravel(),arr.ravel(),arr1[1:,:-1].ravel(),arr1[:-1,1:].ravel()),
         ...: axis=1)                                                                              
    Out[225]: 
    array([[ 0.,  0.,  1.,  6.,  2.],
           [ 0.,  1.,  2.,  7.,  3.],
           [ 0.,  2.,  3.,  8.,  4.],
           [ 0.,  3.,  4.,  9.,  5.],
           [ 0.,  4.,  5., 10., nan],
           [ 1.,  0.,  6., nan,  7.],
           [ 1.,  1.,  7., nan,  8.],
           [ 1.,  2.,  8., nan,  9.],
           [ 1.,  3.,  9., nan, 10.],
           [ 1.,  4., 10., nan, nan]])
    

    To get your list of strings, we can define a format string:

    In [230]: astr = "Row index={}, Column index={}, Value={}, Value below={}, Value to the right={}"  
    

    and apply it to reach row (yes, this does iterate)

    In [233]: for row in Out[225]: 
         ...:     print(astr.format(*row)) 
         ...:                                                                                      
    Row index=0.0, Column index=0.0, Value=1.0, Value below=6.0, Value to the right=2.0
    Row index=0.0, Column index=1.0, Value=2.0, Value below=7.0, Value to the right=3.0
    Row index=0.0, Column index=2.0, Value=3.0, Value below=8.0, Value to the right=4.0
    Row index=0.0, Column index=3.0, Value=4.0, Value below=9.0, Value to the right=5.0
    Row index=0.0, Column index=4.0, Value=5.0, Value below=10.0, Value to the right=nan
    Row index=1.0, Column index=0.0, Value=6.0, Value below=nan, Value to the right=7.0
    Row index=1.0, Column index=1.0, Value=7.0, Value below=nan, Value to the right=8.0
    Row index=1.0, Column index=2.0, Value=8.0, Value below=nan, Value to the right=9.0
    Row index=1.0, Column index=3.0, Value=9.0, Value below=nan, Value to the right=10.0
    Row index=1.0, Column index=4.0, Value=10.0, Value below=nan, Value to the right=nan
    

    If we omit all those ravel, we get a 3d array of values:

    In [234]: np.stack((I,J,arr,arr1[1:,:-1],arr1[:-1,1:]),axis=2)                                 
    Out[234]: 
    array([[[ 0.,  0.,  1.,  6.,  2.],
            [ 0.,  1.,  2.,  7.,  3.],
            [ 0.,  2.,  3.,  8.,  4.],
            [ 0.,  3.,  4.,  9.,  5.],
            [ 0.,  4.,  5., 10., nan]],
    
           [[ 1.,  0.,  6., nan,  7.],
            [ 1.,  1.,  7., nan,  8.],
            [ 1.,  2.,  8., nan,  9.],
            [ 1.,  3.,  9., nan, 10.],
            [ 1.,  4., 10., nan, nan]]])
     In [235]: _.reshape(-1,5)           # to get the 2d array  
     ...