I need to create a data interface that can query the same data either from a excel file or an API or our DB.
What would be the best structure to set this up and how would this normally be set-up to avoid having to manually switch imports based on whether we need data from the excel/api/db.
You can use a Driver/Factory pattern here. Basically, you need to write data drivers, to fetch the data from different endpoints. In all cases the data is the same.
This is a standard OOP use case, and abstractions play a vital role in such designs. What you need is a standard interface/abstraction for the known operations, and implement it across different driver implementations.
In your case, you know the data is a concrete object, and you need a loader (which is synonymous with a driver) that generates this data.
So, define the data object. For. eg.
class MyData:
def __init__(self, *args, **kwargs):
# TODO- Accept the relevant args for data object here!
pass
You could have any add-ons here. Now, what you need is a Data loader, which is basically abstracted. You decide the loader implementation in run time. So, first, decide on an abstraction which could be something like below
from abc import abstractmethod
class AbstractDataLoader:
def __init__(self, *args, **kwargs):
pass
@abstractmethod
def load(self, *args, **kwargs) -> MyData:
pass
The skeleton is defined. Now you need to define the various data loaders you need, which pick the data from different endpoints like a DB or File or API etc. Let's create some implementations like below.
class DBDataLoader(AbstractDataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# load db connections, other configs
def load(self, *args, **kwargs) -> MyData:
# TODO- Load data from DB
pass
class ExcelDataLoader(AbstractDataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# load excel files, other configs
def load(self, *args, **kwargs) -> MyData:
# TODO- Load data from Excel
pass
class APIDataLoader(AbstractDataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# load api connections, other configs
def load(self, *args, **kwargs) -> MyData:
# TODO- Load data from API
pass
Done, we have the data object, drivers ready. Now it's about configuring and using a certain driver. This can be done either imperative, using a factory approach like below
class MyApp:
def __init__(self, configured_loader):
self.configured_loader = configured_loader
def _resolve_loader(self):
if self.configured_loader == 'db':
return DBDataLoader()
elif self.configured_loader == 'excel':
return ExcelDataLoader()
# ....
def load_data(self) -> MyData:
return self._resolve_loader().load()
if __name__ == '__main__':
import sys
loader = sys.argv[1]
app = MyApp(loader)
data = app.load_data()
# Do with it whatever you want!
Or, a better approach to use a declarative manner, by using configurations like an env file. Eg., define a env file like app.env
with some definitions like
myapp:data-loader=loaders.APIDataLoader
myapp:data-loader:api:endpoint=https://some-server/api/v1/data
..
..
..
And use a library like python-dotenv
, to make it available in runtime and then load the data using the class directly.
For eg.,
import os
import importlib
from dotenv import load_dotenv
class MyApp:
def __init__(self):
self.configured_loader = os.getenv("myapp:data-loader")
def _resolve_loader(self):
package_name, class_name = self.configured_loader.rsplit('.', 1)
module = importlib.import_module(package_name)
driver_class = getattr(module, class_name)
return driver_class()
# .... as of now, it creates an instance of APIDataLoader
def load_data(self) -> MyData:
return self._resolve_loader().load()
if __name__ == '__main__':
# Loads the configs from app.env..
load_dotenv(dotenv_path='app.env')
app = MyApp()
data = app.load_data()
# Do with it whatever you want!
This summarizes a simple but extensible approach to your problem.