Question: How to organize a big function that relies on many external data to work. should I declare a class and contain those external data? or should I keep the big function and its data in one file? Or there are better ways of doing it?what's the most computationally efficient way? what's the most pythonic, recommended way?
I have a log file to parse, and the log file contains many formats of strings. I wrote a parseLine(inputStr) function to deal with all possible formats. The parseLine() function requires many precompiled regexes, and a quite big dictionary for lookups. I kept the parseLine() function in a file parseLineFile.py
My parseLineFile.py looks like:
regex0 = re.compile('foo')
regex1 = re.compile('bar')
# and many more regexes
set0 = {'f', '0'}
set1 = {'b', 'a'} # could be a big set contains 10s of strings
# and many more sets
def parseLine(inputString, inputDictionary, inputTimeCriteria):
# pseduo code:
# use regex0 to extract date info in inputString
# check if date within inputTimeCriteria
# use more of previous declared regexes and sets to extract more info,
# branch out to different routines to use more regexes and sets to extract more info
# finally use inputDictionary to look up the meaning of extracted info
# return results in some data structure
In my Main code, I import parseLineFile.py
build myDictionary, decide mytimeCriteria and then use parseLine() to parse a file line by line.
I feel that my question is ... not stack-overflow-ic, but if you are to leave a comment of how I should ask a narrower/specific question, that's great! but please also at least mention how you would approach my problem.
It's hard to specifically tell you what you should do for this specific function, but some tips in regards to organizing big functions:
First, identify what conditionals can be moved to their own function. For example, let's say you have this code:
if 'foo' in inputString:
line = regex()
line = do_something_else()
elif 'bar' in inputString
line = regex()
line = do_something_a_little_different()
You can easily see one abstraction you could do here, and that's to move the functionality in each if
block to its own function, so you would create parseFoo
and parseBar
functions which take a line, and return an expected value.
The main benefit of this is now you have extremely simple functions to unit test with!
Other things I watch out for are:
return
early, to reduce nestingAnyways, more input from you would be ideal but I hope that helps to get you started!