Search code examples
pythonpyparsing

How to efficiently check if defined grammar was replaced after using transformString?


I have put together a python script to strip RCS keywords from thousands of SQL files. Basically it uses pyparse transformString to transform and strip known RCS tags. This function is working however, because I have no way to know if transformString executed the ParseAction, my script is just blindly rewriting the sql code file even when there was no RCS keywords in the scanned file.

Here is my sample code where I'm stripping the RCS keywords, I need to know if the operation found a token to replace and actually did the replace before I decide to write to the current file. If there are no replacements done by transformString, I want to skip writing the file.

from pyparsing import *
# simulate some SQL code
original_code = """
CREATE OR REPLACE FUNCTION oracle_function_name
 (
 p_company_code IN varchar2

)
--
RETURN number
IS

-- $Workfile: oracle_function_name.sql $
-- $Author: az $
-- $Date: 2018/11/20 $
-- $Revision: #1 $

l_rate := 0;
end if;
Close cur_rate;
--
return l_rate;
end;
/

"""
# Grammar definitions
Workfile_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Workfile:') + Word( alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$') + LineStart()
Workfile_Grammar.setParseAction( replaceWith("") )

author_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Author:') + Word( alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$')  + LineStart()
author_Grammar.setParseAction(replaceWith(""))

date_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Date:') + Word( alphanums+"/"+alphanums+"/") + CaselessKeyword('$')  + LineStart()
date_Grammar.setParseAction(replaceWith(""))

revision_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Revision:') + Word( '#'+alphanums) + CaselessKeyword('$')  + LineStart()
revision_Grammar.setParseAction(replaceWith(""))

change_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Change:') + Word(alphanums) + CaselessKeyword('$')  + LineStart()
change_Grammar.setParseAction(replaceWith(""))

dateTime_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Date:') + Word( alphanums+"/"+alphanums+"/") + Word(alphanums+":"+alphanums+":"+alphanums) + CaselessKeyword('$')  + LineStart()
dateTime_Grammar.setParseAction(replaceWith(""))

header_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Header:') + Word( "//"+alphanums+"/"+alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$')  + LineStart()
header_Grammar.setParseAction( replaceWith("") )

postStripFile = author_Grammar.transformString(header_Grammar.transformString(dateTime_Grammar.transformString(change_Grammar.transformString(revision_Grammar.transformString(date_Grammar.transformString(Workfile_Grammar.transformString(original_code)))))))
# Is there a way to check the transFormStrings have found and removed any Grammar (RCS keywords?)

print(postStripFile)

# this is where we write postStripFile back to the original file name 
# so that the files with RCS tags are stripped in place and the ones without are left in place without changes.

Solution

  • The simplest would be to just compare the string before and after calling transformString, and if different, write out to the file.

    # combine all transformers into a single parser, so transform can be done in
    # one pass
    parser = (Workfile_Grammar
              | date_grammar
              | revision_grammar
              | change_grammar
              | dateTime_grammar
              | header_grammar
              | author_grammar
             )
    
    new_sql = parser.transformString(original_sql)
    if new_sql != original_sql:
        # do whatever when detecting original has been transformed
    

    Slightly more efficient might be to add another parse action to all your expressions that sets a global variable to True:

    changed = False
    def changes_made():
        global changed
        changed = True
    
    Workfile_Grammar.setParseAction(changes_made, replaceWith(""))
    ...
    
    changed = False
    new_sql = parser.transformString(original_sql)
    if changed:
        # ... etc. ...
    

    setParseAction will accept multiple functions to be called after a successful parse. Since changes_made makes no modifications to the parsed tokens, it is just a pass-through as far as pyparsing is concerned.

    You will have to be sure to reset changes_made to False before calling transformString multiple times in the same run.

    My personal preference would be the simpler first approach.