python-3.x pandas dataframe switch-statement pandas-apply

Issues Converting Python Code Block to Function

There is a block of code I use regularly in my analysis to standardize the description of the types of devices used by customers to access an internet provider's services. The block of code is as follows:

# Standardize devices_desc labels
###-- SMARTPHONE
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    ["SMART PHONE", "SMARTPHONE"], "SMARTPHONE"
)
###-- FEATURE_PHONE
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    ["FEATURE PHONE"], "FEATURE_PHONE"
)
###-- BASIC_PHONE
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    ["BASIC PHONE", "BASIC"], "BASIC_PHONE"
)
###-- TABLET
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    ["TABLETS", "TABLET"], "TABLET"
)
###-- MODEM/GSM_GATEWAY
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    [
        "MODEM/GSM GATEWAY",
        "DONGLE",
        "PLUGGABLE CARD (E.G. USB STICK)",
        "MODEM/GSM GATEWAY",
    ],
    "MODEM/GSM_GATEWAY",
)
###-- M2M_EQUIPMENT
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    ["M2M EQUIPMENT"], "M2M_EQUIPMENT"
)
###-- NA/UNKNOWN
devices_df["devices_desc"] = devices_df["devices_desc"].replace(
    [np.NaN, "UNKNOWN", "NA", "OTHER", "-"], "UNDEFINED"
)

devices_df is a dataframe while devices_desc is a column in devices_df. I use pandas (Anaconda distribution) for my analysis. I have decided to convert this code block into a function to make it reuseable across all the files I use for the analysis. Below is my initial attempt:

def fix_cust_device_type(devices_desc):
    if devices_desc in ["BASIC PHONE", "BASIC"]:
        return "BASIC_PHONE"
    if devices_desc in ["FEATURE PHONE"]:
        return "FEATURE_PHONE"
    if devices_desc in ["SMART PHONE", "SMARTPHONE"]:
        return "SMARTPHONE"
    if devices_desc in ["TABLETS", "TABLET"]:
        return "TABLET"
    if devices_desc in [
        "MODEM/GSM GATEWAY",
        "DONGLE",
        "PLUGGABLE CARD (E.G. USB STICK)",
        "MODEM/GSM GATEWAY",
    ]:
        return "MODEM/GSM_GATEWAY"
    if devices_desc in ["M2M EQUIPMENT"]:
        return "M2M_EQUIPMENT"
    else:
        return "UNDEFINED"

I tried to apply the function as follows:

devices_df["devices_desc"] = devices_df["devices_desc"].apply(fix_cust_device_type)

However, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-abd75c9eeb58> in <module>
----> 1 devices_df["devices_desc"] = GSM_Data["devices_desc"].apply(fix_cust_device_type)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   4198             else:
   4199                 values = self.astype(object)._values
-> 4200                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4201 
   4202         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-23-bad83bc1b381> in fix_cust_device_type(devices_desc)
      1 def fix_cust_device_type(devices_desc):
----> 2     if devices_desc in ["BASIC PHONE", "BASIC"]:
      3         return "BASIC_PHONE"
      4 
      5     if devices_desc in ["FEATURE PHONE"]:

pandas\_libs\missing.pyx in pandas._libs.missing.NAType.__bool__()

**TypeError: boolean value of NA is ambiguous**

Efforts to ascertain the cause(s) of the error have proven abortive. I will like to understand the following:

The causes of the error
How to rectify the error
The pythonic way of implementing my proposed solution

Kindly assist. Thank you.

Solution

It's hard to tell when you didn't provide any data (or some MWE) but from the error message it seems there are missing data (pd.NA) in your dataframe.

When I try to run your code with simple example, everything works fine, e.g.:

df = pd.DataFrame({"devices_desc": ["BASIC", "DONGLE"]})
df["devices_desc"].apply(fix_cust_device_type)

# Out:
# 0          BASIC_PHONE
# 1    MODEM/GSM_GATEWAY

But when I include missing data I get the error you've posted:

df = pd.DataFrame({"devices_desc": ["BASIC", pd.NA]})
df["devices_desc"].apply(fix_cust_device_type)

# --> TypeError: boolean value of NA is ambiguous

Therefore you should check your data. If NA values are ok, then you should include this in fix_cust_device_type, e.g. adding the following code in the beginning of the function:

if pd.isna(devices_desc):
    return "NA"  # or any string according you needs

If NA values are not ok, you should drop them. E.g. df.dropna() or df.dropna(subset=["devices_desc"]).

Another way of dealing with your issue would be following:

Convert your function to dictionary

# This is a short version just for showcase
replace_dict = {'BASIC': 'BASIC_PHONE', 'DONGLE': 'MODEM/GSM_GATEWAY'}

Use replace method with the created dict (no need for apply and will work with missing values)

df = pd.DataFrame({"devices_desc": ["BASIC", "DONGLE", pd.NA]})
df["devices_desc"] = df["devices_desc"].replace(replace_dict)

# Content of df:
#         devices_desc
# 0        BASIC_PHONE
# 1  MODEM/GSM_GATEWAY
# 2               <NA>