Search code examples
pythonpandasvalidationconceptual

What is the optimal approach for validating object types of method arguments in Python?


This isn't a specific issue with code and more of an open, conceptual question, so I hope it's in the right place.

I have a pandas dataframe, and I often subset data on bounding times and other optional variables, here frequency. The frequency has discrete values, so I can select data from a single or multiple channels. The function I have looks something like this:

def subset_data(data, times, freq=None):

    sub_data = data.loc[data['time'].between(*times), :]

    if freq is not None:
   
        if isinstance(freq, int):

            sub_data = sub_data.loc[sub_data['frequency'] == freq, :]

        elif isinstance(freq, tuple):

            sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
                
    return sub_data

I wanted to modify the second condition to be a more general check for any numeric type, and I found this question - What is the most pythonic way to check if an object is a number?. The accepted answer made me question my approach here and its validity in general. The last point, in particular:

If you are more concerned about how an object acts rather than what it is, perform your operations as if you have a number and use exceptions to tell you otherwise.

I interpret this as implying I should do something like this

def subset_data(data, times, freq=None):

    sub_data = data.loc[data['time'].between(*times), :]

    if freq is not None:

        try:
   
            if isinstance(freq, tuple):

                sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
       
            elif isinstance(freq, int):

                sub_data = sub_data.loc[sub_data['frequency'] == freq, :]

        except TypeError:

            print('sub_data filtered on time only. freq must be numeric.')

    return sub_data

or

if isinstance(freq, tuple):

    sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]
       
elif isinstance(freq, int):

    sub_data = sub_data.loc[sub_data['frequency'] == freq, :]

else:

    raise TypeError('freq must be tuple or numeric')

but would be interested to know if that's anything close to the consensus.

The original is also missing some validation for completeness - I'm too lazy to write this in my own code and feel like it adds unnecessary clutter if I assume that I'll be the only one using it and have a priori knowledge of the types. If this was not the case, I would include:

if isinstance(freq, int):

    sub_data = sub_data.loc[sub_data['frequency'] == freq, :]

elif isinstance(freq, tuple) and len(freq) == 2:

    if isinstance(freq[0], int) and isinstance(freq[1], int):

        sub_data = sub_data.loc[sub_data['frequency'].between(*freq), :]

Is the practice of checking explicitly for the object type and attributes, and this approach to validation in general, appropriate in Python, or is my knowledge lacking somewhere? Maybe everything could be written more concisely, and I'm missing something. There's technically two questions in this post, but I hope the general, overarching concept is clear enough to be useful for others and allow for some answers.


Solution

  • If I am not wrong, with the second approach of using a try/except, if the incoming type is incorrect, you will just be shown a TypeError:... and not really a detailed pinpoint to what exactly is causing the issue in the code. With that said, the first approach, you're hardening the checking process by checking for two conditions the int and tuple which is good. I wouldn't have a preference, but both approaches are fine to me, although if the Exception clause you could possibly make it more detailed to get a specific error log (if any).

    A good example of understanding Exceptions, if you want to would be too look into examples of KeyError when trying to access an element or value in a dictionary that doesn't exist and then print(e) #e is the error from KeyError exception being raised. Hope this helps somewhat. Cheers.