Search code examples
pythonpython-3.xpandasnanfillna

Python change NAN to vector of zeros


I have a question in Python. I build up a paragraph vector using doc2vec and convert it to a times series. So I have an index which is the date and then 8 companies and for each day there is a vector of dimension 100 for each company to represent the news article. However, there are some days where there are no articles, which return NaN values. Now I would like to convert them to a zero vector of dimension 100.

I tried to do that using this code snippet:

test_df.fillna(value=np.zeros(100) , inplace = True)  

However, that doesn't work because I can't replace the NaN by a list or an array. Is there away to fix this problem?

Thank you very much!


Solution

  • Perhaps you can try:

    zeros = np.zeros(100)
    series.apply(lambda x: x if isinstance(x,np.ndarray) else zeros)
    

    For an example of what this looks like (with only vectors of length 2 to keep things clear):

    series = pd.Series({1:np.array([1,2]), 2: np.nan, 3: np.array([3,4])})
    series
    
    1    [1, 2]
    2       NaN
    3    [3, 4]
    dtype: object
    
    zeros = np.zeros(2)
    series = series.apply(lambda x: x if isinstance(x,np.ndarray) else zeros)
    series
    
    1        [1, 2]
    2    [0.0, 0.0]
    3        [3, 4]
    dtype: object
    

    If your data is in a DataFrame then a similar pattern with applymap should work:

    df = pd.DataFrame({'company_a': {1:np.array([1,2]), 2: np.nan, 3: np.array([3,4])}, 'company_b': {1:np.nan, 2: np.array([9,7]), 3: np.nan}})
    df
    
      company_a company_b
    1    [1, 2]       NaN
    2       NaN    [9, 7]
    3    [3, 4]       NaN
    
    zeros = np.zeros(2)
    df = df.applymap(lambda x: x if isinstance(x,np.ndarray) else zeros)
    df
    
        company_a   company_b
    1      [1, 2]  [0.0, 0.0]
    2  [0.0, 0.0]      [9, 7]
    3      [3, 4]  [0.0, 0.0]