Search code examples
pythonregeximportdscons

Faulty D Dependency Logic in SCons


I've tracked down a bug in the dependency logic for D sources in SCons.

The self.cre regexp import\s+(?:\[a-zA-Z0-9_.\]+)\s*(?:,\s*(?:\[a-zA-Z0-9_.\]+)\s*)*; in SCons.Scanner.D doesn't cover patterns such as...

import IMPORT_PATH : SYMBOL;

...only:

import IMPORT_PATH;

Same with the self.cre2 regexp (?:import\s)?\s*([a-zA-Z0-9_.]+)\s*(?:,|;) two lines later.

I believe both the self.cre and self.cre2 regexps need to be fixed; but I don't quite understand how they are related. My guess is that self.cre matches the whole import statements and self.cre2 matches parts of them. Am I correct? If so self.cre2 needs to be corrected to handle cases such as:

import X, Y, Z;

Does anyone have any idea how to fix the regexps so that they handle these cases?

My first try is to change

p = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)\s*)*;'

to

p = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*;'

I've tried debugging this but in vain.

Python:

import re
p = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)\s*)*;'

re.match(p, "import first;") # match
re.match(p, "import first : f;") # no match

p2 = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*;'

re.match(p2, "import first;") # match
re.match(p2, "import first : f;") # no match but should match
re.match(p2, "import first : f, second : g;") # no match but should match

Solution

  • Short Answer

    To handle all the cases you have outlined, try the following twist on your changes to the (self.cre) pattern:

    import\s+(?:[a-zA-Z0-9_.]+)\s*(?:(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*)*;
    

    Regular expression visualization

    Debuggex Demo

    Digging Deeper

    self.cre vs. self.cre2

    Yes, the find_include_names method...

    def find_include_names(self, node):
        includes = []
        for i in self.cre.findall(node.get_text_contents()):
            includes = includes + self.cre2.findall(i)
        return includes
    

    ...confirms the relationship between self.cre and self.cre2 that you guessed: the former matches entire import statements, and the latter matches (and captures) modules therein. (Note the middle (...) capture group in self.cre2 vs. (?:...) non-capture groups elsewhere throughout self.cre and self.cre2.)

    self.cre

    Picking up where your Python snippet left off...

    import re
    
    import1 = "import first;"
    import2 = "import first : f;"
    import3 = "import first : f, second : g;"
    
    
    p = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)\s*)*;'
    
    pm1 = re.match(p, import1) # match
    if pm1 != None:
        print "p w/ import1 => " + pm1.group(0)
    
    pm2 = re.match(p, import2) # no match
    if pm2 != None:
        print "p w/ import2 => " + pm2.group(0)
    
    
    p2 = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*;'
    
    p2m1 = re.match(p2, import1) # match
    if p2m1 != None:
        print "p2 w/ import1 => " + p2m1.group(0)
    
    p2m2 = re.match(p2, import2) # no match but should match
    if p2m2 != None:
        print "p2 w/ import2 => " + p2m2.group(0)
    
    p2m3 = re.match(p2, import3) # no match but should match
    if p2m3 != None:
        print "p2 w/ import3 => " + p2m3.group(0)
    

    ..., we get the following expected output for p and p2 attempts to match the import statements:

    p w/ import1 => import first;
    p2 w/ import1 => import first;
    

    Now consider p2prime, wherein I have made changes to arrive at the pattern I suggested above:

    import re
    
    import1 = "import first;"
    import2 = "import first : f;"
    import3 = "import first : f, second : g;"
    import4 = "import first, second, third;"
    
    p2prime = 'import\s+(?:[a-zA-Z0-9_.]+)\s*(?:(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?(?:,\s*(?:[a-zA-Z0-9_.]+)(?:\s*:\s*[a-zA-Z0-9_.]+)??\s*)*)*;'
    
    p2pm1 = re.match(p2prime, import1) # match
    if p2pm1 != None:
        print "p2prime w/ import1 => " + p2pm1.group(0)
    
    p2pm2 = re.match(p2prime, import2) # now a match
    if p2pm2 != None:
        print "p2prime w/ import2 => " + p2pm2.group(0)
    
    p2pm3 = re.match(p2prime, import3) # now a match
    if p2pm3 != None:
        print "p2prime w/ import3 => " + p2pm3.group(0)
    
    p2pm4 = re.match(p2prime, import4) # now a match
    if p2pm4 != None:
        print "p2prime w/ import4 => " + p2pm4.group(0)
    

    With the updated pattern (p2prime) we get the following desired output for its attempts to match the import statements:

    p2prime w/ import1 => import first;
    p2prime w/ import2 => import first : f;
    p2prime w/ import3 => import first : f, second : g;
    p2prime w/ import4 => import first, second, third;
    

    This is a pretty lengthy and involved pattern: so I would not be surprised to find opportunities to fine tune it further; but it does what you want and should provide a solid basis for fine tuning.

    self.cre2

    For self.cre2, similarly try the following pattern:

    (?:import\s)?\s*(?:([a-zA-Z0-9_.]+)(?:\s+:\s+[a-zA-Z0-9_.]+\s*)?)\s*(?:,|;)
    

    Regular expression visualization

    Debuggex Demo

    Keep in mind, however, that the since D's <module> : <symbol> selective imports are just that – selective, capturing the module names in selective imports may not be what you ultimately need (e.g. vs. capturing the module and selected symbol names). As I similarly explained regarding the self.cre regexp I suggested, further fine tuning where warranted should not be difficult.