I often write and participate in the development of web applications in Python together with FastAPI. I always use the same folder structure which is as follows:
I think the names are clear to know what is in each of them. But I want to focus on two folders: api and services. Within api are all the modules with the endpoints of the application (one module for each "Entity", for example in the module api/routers/users.py will be all the endpoints for the users. In these endpoints there is no complex logic , all the logic is in the "services" folder. So, in the module services/users.py there will be a whole class called User with the logic. In many cases that class is simply a CRUD that inherits from a BASE CRUD and doesn't even extend the class Or maybe add one or two more methods. However, in some cases, the "entity" requires much more complex logic, it is appropriate to implement some design patterns, interfaces etc. When that happens I feel overwhelmed by having to put everything in the services/users.py module. (which implies a very large file). I've even seen how other developers continue to extend the User class (it's just an example, it can be anything) with a lot of methods that have nothing to do with the class, making the code too coupled and with low cohesion. As a solution to that, I thought of creating a folder as such for each entity and not a module. Then it would be services/user and there have all the user logic distributed in more than one module if necessary. But I'm not sure I'm doing the right thing in terms of design. Am I making things more complicated? Or is it a correct strategy?
overwhelmed by having to put everything in the services/users.py module.
Like methods and classes, a module should do one thing and do it well.
Look at the top of the users module. Does it have a """docstring"""? No? Well, add one.
Now, you're about to add a method for a new service, presumably something that is tightly bound to the User entity. Go read the first sentence of the docstring, which explains what the module is responsible for. Does your new service fit within its purview? If yes, append the implementation, otherwise find or invent another module to place it in.
EDIT
maybe I should just stop thinking that the modules should not be "considerably large". After all, there's nothing wrong with that.
Well, I don't completely agree with that, for three broad reasons.
If a module offers just one or two public symbols, and then there's a ton of sibling modules with somewhat similar functionality, that's not doing the Gentle Reader any favors. The clutter has simply been moved up a level, so one must search across modules instead of within a module.
I like the example of collections, which """implements specialized container datatypes""". Do I use each container equally often? No. But they make sense, they cohere together, and I know where to go looking.
Humans readily keep track of seven or so items.
Double or triple that can be fine.
I love to make
beautiful soup,
but the object that comes back has 133 public members.
(Ok, in fairness 96, after stripping out get_text vs getText synonyms.)
I assert that, like PL/I keywords and php builtins,
some modules such as the venerable numpy
(575)
and pandas (119) DataFrame
(208)
offer an embarrasment of riches, too many riches,
more than I keep track of or even know to go looking for.
And that's even after DataFrame
has admirably
gone to the trouble of hiding 139 _private
elements.
I'm glad they offer so much, and I'm not suggesting to re-org them at this point. But keep in mind that you don't necessarily want your design to end up like that, for the sake of some newly hired engineer who has to learn your Public API. If you break out even one or two subcategories, then when future maintenance engineers add One More Feature they will be guided by your example, and perhaps add the feature to an existing small category, or will be emboldened to invent a new category.
The minimum unit of granularity for accessing your library
is import
. Even if an app-level developer
does from numpy import NaN
, that still requires
a full parse of the numpy module, discarding all which is not NaN.
The full memory footprint of the module hangs out behind the scenes,
cached, in case other symbols might be requested.
A large module will tend to sprout a bunch of deps,
which in turn have transitive deps.
The app developer incurs the cost of all those nested import
dependencies.
So a large module does not come "for free", it always takes some milliseconds to import it. At the time you find yourself adding a new dependency which your new feature relies on, you should ask yourself, "is a new module appropriate?", "would some consumers only want A, without the expense of loading B?"
Python code is pretty portable, but sometimes the code or its deps will insist on a version range of some library, or a certain OS, or a recent interpreter.
Conda, poet, and pip do a pretty good job of untangling the proverbial DLL hell, but there's no 100% guarantee. If you can offer core services to some consumers of your library with minimal deps, you should, to ease their upgrade versionitis pains.