Search code examples
pythonfunctionclassorganization

Python, How to organize a big function that relies on many external data to work


Question: How to organize a big function that relies on many external data to work. should I declare a class and contain those external data? or should I keep the big function and its data in one file? Or there are better ways of doing it?what's the most computationally efficient way? what's the most pythonic, recommended way?

I have a log file to parse, and the log file contains many formats of strings. I wrote a parseLine(inputStr) function to deal with all possible formats. The parseLine() function requires many precompiled regexes, and a quite big dictionary for lookups. I kept the parseLine() function in a file parseLineFile.py

My parseLineFile.py looks like:

regex0 = re.compile('foo')
regex1 = re.compile('bar')
# and many more regexes

set0 = {'f', '0'}
set1 = {'b', 'a'} # could be a big set contains 10s of strings
# and many more sets

def parseLine(inputString, inputDictionary, inputTimeCriteria):
    # pseduo code:
    #   use regex0 to extract date info in inputString
    #   check if date within inputTimeCriteria
    #   use more of previous declared regexes and sets to extract more info, 
    #       branch out to different routines to use more regexes and sets to extract more info
    #   finally use inputDictionary to look up the meaning of extracted info    
    #   return results in some data structure

In my Main code, I import parseLineFile.py
build myDictionary, decide mytimeCriteria and then use parseLine() to parse a file line by line.

I feel that my question is ... not stack-overflow-ic, but if you are to leave a comment of how I should ask a narrower/specific question, that's great! but please also at least mention how you would approach my problem.


Solution

  • It's hard to specifically tell you what you should do for this specific function, but some tips in regards to organizing big functions:

    First, identify what conditionals can be moved to their own function. For example, let's say you have this code:

    if 'foo' in inputString:
       line = regex()
       line = do_something_else()
    elif 'bar' in inputString
       line = regex()
       line = do_something_a_little_different()
    

    You can easily see one abstraction you could do here, and that's to move the functionality in each if block to its own function, so you would create parseFoo and parseBar functions which take a line, and return an expected value.

    The main benefit of this is now you have extremely simple functions to unit test with!

    Other things I watch out for are:

    • Are you do many nesting of conditionals? Extract into a function and return early, to reduce nesting
    • If you're repeating yourself with different inputs, extract into a function
    • Mentally scan the function a day later and see if I still get it quite easily. If not, extract into smaller bits.

    Anyways, more input from you would be ideal but I hope that helps to get you started!