I have
import pandas as pd
import numpy as np
df = pd.DataFrame({"x": ["red", "blue", np.nan, np.nan, np.nan, np.nan, np.nan, ],
"y": [np.nan, np.nan, np.nan, 'cold', 'warm', np.nan, np.nan, ],
"z": [np.nan, np.nan, np.nan, np.nan, np.nan, 'charm', 'strange'],
}).astype("category")
giving
x y z
0 red NaN NaN
1 blue NaN NaN
2 NaN NaN NaN
3 NaN cold NaN
4 NaN warm NaN
5 NaN NaN charm
6 NaN NaN strange
I would like to add a new categorical column with unordered values red,blue,hot,cold,warm, charm, strange, filled in appropriately. I have many such columns, not just three.
Some possiblities:
astype(str)
and concatenating and then re-creating a categoricalunion_categoricals
and then cast each column to that type? and then serially fillna()
them?I can't make those or anything else work.
Notes:
using .astype(pd.CategoricalDtype(ordered=True))
in place of .astype("category")
in defining df
also works with the answer below.
New Solution
For the purpose of using for a large datasets, the following solution may be more efficient:
def my_fun(x):
m = ~ pd.isnull(x)
if m.any():
return x[m]
else:
return np.nan
df['new'] = np.apply_along_axis(my_fun, 1, df.to_numpy())
x y z new
0 red NaN NaN red
1 blue NaN NaN blue
2 NaN NaN NaN NaN
3 NaN cold NaN cold
4 NaN warm NaN warm
5 NaN NaN charm charm
6 NaN NaN strange strange
Edited answer
As specified by the OP, in case there are rows where all values are np.NaN
we could try the following solution:
df['new_col'] = df.dropna(how='all').apply(lambda x: x.loc[x.first_valid_index()], axis=1)
df['new_col'] = pd.Categorical(df.new_col)
df
x y z new_col
0 red NaN NaN red
1 blue NaN NaN blue
2 NaN NaN NaN NaN
3 NaN cold NaN cold
4 NaN warm NaN warm
5 NaN NaN charm charm
6 NaN NaN strange strange