Search code examples
pythonoopmethodsdecorator

How to avoid re-defining a class that differs just for inputs and outputs data type conversions from another class?


I have a class with two methods: 'fit' and 'transform'. Those methods are supposed to work with a specific data type (multi-index pandas DataFrame) but I would like to use it with another data type (numpy 3d array), but it seems not convenient to me to copy and paste the class and just add conversion of data in input and output.

What is the DRY best practice here? use decorators on methods?

Below a simple pseudo-code of what I mean

import numpy as np
import pandas as pd


class Transformer2d:

  def __init__(self):
    pass

  def fit(self, X, y):
    self.foo = np.mean(X['0'])
    return self

  def transform(self, X):
    X['0'] = X['0'] / self.foo
    return X


class Transformer3d:
  """
  I would not like to create this class because is very similar 
  to the previous one except for data type conversion for inputs and outputs
  """

  def __init__(self):
    pass

  def fit(self, X, y):
    X_ = threedim2twodim(X)  # difference with the previous class
    self.foo = np.mean(X_['0'])
    return self

  def transform(self, X):
    X_ = threedim2twodim(X)  # difference with the previous class
    X_['0'] = X_['0'] / self.foo
    return twodim2threedim(X_, X.shape[0])  # difference with the previous class


# data type conversion functions
def threedim2twodim(X:np.ndarray):
    return X.swapaxes(2, 1).reshape(-1, X.shape[1])


def twodim2threedim(X:np.ndarray, n_samples:int=-1):
    return X.reshape(n_samples, -1, X.shape[1]).swapaxes(1, 2)

Solution

  • You can use functools.singledispatchmethod for this. A simple demonstration is overloading the __init__() function of a class to take either integers or floats and for the class to have different behavior depending on the input types.

    from functools import singledispatchmethod
    
    class Point:
        @singledispatchmethod
        def __init__(self, x: int, y: int):
            self.x = x
            self.y = y
    
        @__init__.register
        def _(self, x: float, y:float):
            self.x = int(x)
            self.y = int(y)
    
        def norm(self):
            return self.x**2 + self.y**2
    
        
    integer_p = Point(3,4)
    print(f"Int norm: {integer_p.norm()}")
    
    float_p = Point(3.1, 4.2)
    print(f"Float norm: {float_p.norm()}")
    

    This way you don't have to do any type checking yourself with isinstance.