I implemented the following design using abstract class and its subclass class as follows
from abc import ABC, abstractmethod
class Pipeline(ABC):
@abstractmethod
def read_data(self):
pass
def __init__(self, **kwargs):
self.raw_data = self.read_data()
self.process_data = self.raw_data[self.used_cols]
class case1(Pipeline):
def read_data(self):
return pd.read_csv("file location") # just hard coding for the file location
@property
def used_cols(self):
return ['col_1', 'col_2','col_3','col_4']
I can invoke the class of case1
as follows. It will in fact read a csv file into pandas dataframe.
data = case1()
This existing design will return four hard coded columns, e.g., 'col_1','col_2','col_3' and 'col_4', and it just works fine. At present, I would like to control the columns to be returned by modifying the subclass, in specific, the function of used_cols
. I modified class case1
as follows, but it will cause the error message.
class case1(Pipeline):
def read_data(self):
return pd.read_csv("file location") # just hard coding for the file location
@property
def used_cols(self, selected_cols):
return selectd_cols
It was called as follows
selected_cols = ['col_2','col_3']
data = case1(selected_cols)
It turns out that this modification is not right, and generates the error message such as TypeError: init_subclass() takes no keyword arguments So my question is how to modify the subclass to get the desired control.
I think you did not fully understand the purpose of properties.
If you create a property used_cols, you'll accessing it using obj.used_cols instead of obj.used_cols(). After creating the property it's not easily possible to call the underlying function directly.
col_0,col_1,col_2,col_3
1,1,1,2
2,3,3,4
3,3,3,6
from abc import ABC, abstractmethod
import pandas as pd
class Pipeline(ABC):
@abstractmethod
def read_data(self):
pass
def __init__(self, **kwargs):
self.raw_data = self.read_data()
self.used_cols = kwargs["selected_cols"]
self.process_data = self.raw_data[self.used_cols]
class case1(Pipeline):
def read_data(self):
return pd.read_csv("file_location.csv") # just hard coding for the file location
@property
def used_cols(self):
return self._used_cols
@used_cols.setter
def used_cols(self,selected_cols):
self._used_cols = selected_cols
selected_cols = ['col_2','col_3']
data = case1(selected_cols = selected_cols)
print(data.process_data)
col_2 col_3
0 1 2
1 3 4
2 3 6