Pandas: what is truth value condition when dealing with missing data

I have a function that creates a ratio. It is defined as

def create_ratio(data,num,den):
    if data[num].isnull():
        ratio = -9997
    if data[den].isnull():
        ratio = -9998
    if data[num].isnull() & data[den].isnull():
        ratio = -9999
    else:
        ratio = data[num]/data[den]
    return ratio

I have pandas dataframe (df_credit) which includes credit card balance (cc_bal) and limit (cc_limit) and I want to calculate credit card utilization which is balance over limit

df_credit['cc_util'] = create_ratio(df_credit,'cc_bal','cc_limit')

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-d53809a7690d> in <module>
----> 1 data['ratio_cc_util'] = create_ratio(data,'open_credit_card_credit_limit_nomiss','open_credit_card_credit_limit_nomiss')
      2 data['ratio_cc_util'].hist()

<ipython-input-65-99bc55b184ed> in create_ratio(data, num, den)
      1 def create_ratio(data,num,den):
----> 2     if data[num].isnull():
      3         ratio = -9997
      4     if data[den].isnull():
      5         ratio = -9998

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1441     def __nonzero__(self):
   1442         raise ValueError(
-> 1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1445         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is the solution to this error? Thanks.

Solution

you are mixing between scalars and series, where your function needs to return a series or array given its calling context
as simple way as any to implement this conditional logic is np.select()
have simulated data, including missing values to meet your use cases

df = pd.DataFrame({
        "cc_bal": np.random.uniform(200, 1000, 200),
        "cc_limit": np.random.uniform(800, 1200, 200),})

df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_bal"] = None
df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_limit"] = None


def create_ratio(df, num, den):
    return np.select(
        [
            df[num].isnull() & df[den].isnull(),
            df[num].isnull(),
            df[den].isnull(),
        ],
        [-9999, -9997, -9998],
        df[num] / df[den],
    )


df["ratio"] = create_ratio(df, "cc_bal", "cc_limit")
df

sample output

	cc_bal	cc_limit	ratio
0	372.633	981.996	0.379465
1	845.541	1133.69	0.745831
2	449.406	975.903	0.460503
3	209.827	922.829	0.227374
4	237.347	936.654	0.253398
5	351.154	nan	-9998
6	nan	873.671	-9997
7	803.396	861.791	0.93224
8	591.136	807.176	0.732352
9	675.397	847.059	0.797344