I have a number of python modules with different functionality which I want to run on their own or as part of a larger data pipeline. I've organised the code to do this with the following rough layout:
dataProcessing.py
import...
def main():....
if __name__ == "__main__":
parser = argparse.ArgumentParser()
args = parser.parse_args([])
parser.add_argument("--arg1").....
dataPipeline.py
import dataProcessing
def main():
dataProcessing(args)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
args = parser.parse_args([])
parser.add_argument("--arg1").....
In this way I can arrange the modules in a pipeline and run them separately which is necessary for the project. However this does mean that the arguments need to be defined in both modules, which I can live with. This is okay for command line arguments, but harder when I come back to the files sometime later. Is there some way to have a file with the arguments already defined in so I it's easy to come back to the file and easier to edit rather than in the command line? I think a config file would be suitable but I am a bit doubtful about my general approach so any advice on best practice would be really appreciated.
Yes, you can just define a new .py
file, where you define your arguments/parameters.
For instance, create a file params.py
and inside this file you would define a variable like:
var = 'sample_string.txt'
In your other python files you would import
, for instance like that:
import params as p
and then you can use the arguments as such:
samplefilename_from_params_file = p.var
This allows you to have all arguments condensed in one file.
However, there are other methods to deal with that and have a real configuration file
. That could be a INI
file or a yaml
file.