I know there are a lot of questions that talk about circular imports. I've looked at a lot of them, but I can't seem to be able to figure out how to apply them to this scenario.
I have a pair of data loading classes for importing data from an excel document with multiple sheets. The sheets are each configurable, so the sheet name and the individual column names can change (but they have defaults defined in the class attributes).
There are other inter-class references between the 2 classes in question, but I figured this example was the most straight-forward:
One of the features of a separate export script is to use the loaders' metadata to populate an excel template (with multiple sheets). That template has comments on the column headers in each sheet that reference the other sheets, because the contents of some sheets are used to populate dropdowns in other sheets.
So a comment in a header in one sheet may say "This column's dropdown data is populated by the contents of column X in sheet 2". And sheet 2 column X's header would have a comment that says "This column's contents is used to populate dropdowns for column Y in sheet 1."
I went ahead and added the respective imports, knowing I would end up with a circular import issue, but I figured I would get everything conceptually established as to what I wanted to do and then try and solve the import issue.
Here's some toy code to try and boil it down:
infusates_loader.py
:
from DataRepo.loaders.tracers_loader import TracersLoader
class InfusatesLoader(TableLoader):
DataColumnMetadata = DataTableHeaders(
TRACERNAME=TableColumn.init_flat(
...
source_sheet=TracersLoader.DataSheetName,
source_column=TracersLoader.DataHeaders.NAME,
),
)
tracers_loader.py
:
from DataRepo.loaders.infusates_loader import InfusatesLoader
class TracersLoader(TableLoader):
DataColumnMetadata = DataTableHeaders(
NAME=TableColumn.init_flat(
...
# Cannot reference the InfusatesLoader here (to include the name of its
# sheet and its tracer name column) due to circular import
target_sheet=InfusatesLoader.DataSheetName,
source_column=InfusatesLoader.DataHeaders.TRACERNAME,
),
)
I was able to avoid the issue for now just by setting static string values in tracers_loader.py
, but ideally, those values would only live in one place (each in its respective class).
A lot of the circular import questions out there have to do with methods, so I don't think they apply to class attributes? I tried using importlib
and I tried doing the import inside a function, but as soon as it tried to set up the class, I get hit with the import error.
Often times, answers to circular import questions fall into a couple of categories:
Usually however, the "rethink your design" is not accompanied with any suggested design patterns.
Clearly, in my case, it was a design issue. I knew that was likely the case, but I couldn't see the forest for the trees. I was holding firm to the concept of (as @user2357112's comment pointed out) a "single source of truth". However, I was missing the fact that I had a conceptual option that was true to what I was modeling. I had a class for each sheet in my excel document, but I was missing a class for the document itself, where I could put the inter-sheet relationships (and the definition of the list of sheets in the document). @user2357112 had put it in terms of a "config", but I realized that could easily be a sort of "superclass" or "coordinating class".
I'm just about to endeavor into creating that class, but I'm certain a class that defines the inter-sheet relationships is what's called for here.
I don't know what to call such a design pattern or what specific form it will take, but conceptually, that was what I was missing.
So, to give an example, what I need is something like:
study_doc.py
:
class StudyDoc():
infusates_tracer_reference: {
"sheet": "Tracers",
"column": "Name",
}
tracers_infusate_reference: {
"sheet": "Infusates",
"column": "Tracer Name",
}
Then I can import that in both of the other classes:
infusates_loader.py
:
from DataRepo.loaders.study_doc import StudyDoc
class InfusatesLoader(TableLoader):
DataColumnMetadata = DataTableHeaders(
TRACERNAME=TableColumn.init_flat(
...
source_sheet=StudyDoc.infusates_tracer_reference["sheet"],
source_column=StudyDoc.infusates_tracer_reference["column"],
),
)
tracers_loader.py
:
from DataRepo.loaders.study_doc import StudyDoc
class TracersLoader(TableLoader):
DataColumnMetadata = DataTableHeaders(
NAME=TableColumn.init_flat(
...
target_sheet=StudyDoc.tracers_infusate_reference["sheet"],
source_column=StudyDoc.tracers_infusate_reference["column"],
),
)
I probably will make it a bit more sophisticated, but this is the basic idea: All these relationships are based on the fact that they all belong to the same excel document. It's that document that is the basis for the relationships, so it should coordinate their connections.