Search code examples
python-3.xpandasdataframepython-zip

the ways of using zip function to generate the cell location based on the row and column information


I created the following dataframe, and would like to identify the cells which are Null,

import pandas as pd
import numpy as np
data = [{'a': 1, 'b': 2, 'c':3},
        {'a':10, 'b': np.NaN, 'c':"" },
         {'a':10, 'b':"" , 'c':np.NaN }]
df = pd.DataFrame(data)

     a    b     c
0    1    2     3
1   10   NaN    
2   10         NaN

I used the following code, x1 = np.where(pd.isnull(df)) and get the result like

print(x1)
(array([1, 2], dtype=int64), array([1, 2], dtype=int64))

However, I want to generate the cell location explicitly for each entry associated with NaN. I use the zip function, but get the following error message

print(set(zip(x1)))



 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 print(set(zip(x1)))

TypeError: unhashable type: 'numpy.ndarray'

What are the right ways to generate the location information explicitly based on x1?


Solution

  • You could use numpy.where:

    import numpy as np
    null_indices, col_idx = np.where(df.isna())
    null_columns = df.columns[col_idx]
    

    Output:

    (array([1, 2], dtype=int64), Index(['b', 'c'], dtype='object'))
    

    If you want to see it as tuples, you can zip:

    out = list(zip(null_indices, null_columns))
    

    Output:

    [(1, 'b'), (2, 'c')]
    

    For your specific code, since x1 is a tuple of arrays, you need to unpack them inside zip, like:

    out = list(zip(*x1))