For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so:
<import statements>
<constant declarations>
<misc function declarations>
def main():
for blah in blahs():
<lots of local variables>
<lots of tightly coupled computation>
for something in somethings():
<lots more local variables>
<lots more computation>
<etc., etc.>
<save results>
if __name__ == "__main__":
main()
This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed.
Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance.
What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.
First off, if your program is going to be running for hours/days then the overhead of switching to using classes/methods instead of putting everything in a giant main is pretty much non-existent.
Additionally, refactoring (even if it does involve passing a lot of variables) should help you improve speed in the long run. Profiling an application which is designed well is much easier because you can pin-point the slow parts and optimize there. Maybe a new library comes along that's highly optimized for your calculations... a well designed program will let you plug it in and test right away. Or perhaps you decide to write a C Module extension to improve the speed of a subset of your calculations, a well designed application will make that easy too.
It's hard to give concrete advice without seeing <lots of tightly coupled computation>
and <lots more computation>
. But, I would start with making every for
block it's own method and go from there.