I have a question in Python. I build up a paragraph vector using doc2vec
and convert it to a times series. So I have an index which is the date and then 8 companies and for each day there is a vector of dimension 100 for each company to represent the news article. However, there are some days where there are no articles, which return NaN
values. Now I would like to convert them to a zero vector of dimension 100.
I tried to do that using this code snippet:
test_df.fillna(value=np.zeros(100) , inplace = True)
However, that doesn't work because I can't replace the NaN
by a list or an array. Is there away to fix this problem?
Thank you very much!
Perhaps you can try:
zeros = np.zeros(100)
series.apply(lambda x: x if isinstance(x,np.ndarray) else zeros)
For an example of what this looks like (with only vectors of length 2 to keep things clear):
series = pd.Series({1:np.array([1,2]), 2: np.nan, 3: np.array([3,4])})
series
1 [1, 2]
2 NaN
3 [3, 4]
dtype: object
zeros = np.zeros(2)
series = series.apply(lambda x: x if isinstance(x,np.ndarray) else zeros)
series
1 [1, 2]
2 [0.0, 0.0]
3 [3, 4]
dtype: object
If your data is in a DataFrame then a similar pattern with applymap should work:
df = pd.DataFrame({'company_a': {1:np.array([1,2]), 2: np.nan, 3: np.array([3,4])}, 'company_b': {1:np.nan, 2: np.array([9,7]), 3: np.nan}})
df
company_a company_b
1 [1, 2] NaN
2 NaN [9, 7]
3 [3, 4] NaN
zeros = np.zeros(2)
df = df.applymap(lambda x: x if isinstance(x,np.ndarray) else zeros)
df
company_a company_b
1 [1, 2] [0.0, 0.0]
2 [0.0, 0.0] [9, 7]
3 [3, 4] [0.0, 0.0]