Search code examples
pythonpython-3.xpython-3.7cx-oracle

Replace specific control/non-printable characters from string


I am writing a script that reads sql files and uses cx_Oracle to execute the files. Some files are written using only spaces between each sql operator and keyword and others use newlines and tabs for whitespace. I am facing issue with the latter. For example this section returns the following:

NON_PRINTABLE = """\r\n\t"""
def parseSQLFile(filename):
    with open(filename, 'r') as sqlFile:
        allSQL = sqlFile.read()
        #filteredSQL = filter(lambda ln: ln in string.printable, allSQL)
        # replace specific control characters with spaces to prevent sql compiler errors
        for char in NON_PRINTABLE:
            allSQL.replace(char,' ')


    return allSQL

I've tried use the filter function, translate, and replace; however, I still get the following varying results from the following input:

Input:

'select\n\ts.id\n\t,s.src_cnt\n\t,s.out_file\t\nfrom\n\tkpi_index_ros.composites s\n\t,kpi_index_ros.kpi_index_rosoards d\nwhere\n\t1 = 1\n\tand s.kpi_index_rosoard_id (+) = d.id\n\tand d.active = 1\n;'

Output 1:

'select\n s.id\n ,s.src_cnt\n ,s.out_file \nfrom\n kpi_index_ros.composites s\n ,kpi_index_ros.kpi_index_rosoards d\nwhere\n 1 = 1\n and s.kpi_index_rosoard_id (+) = d.id\n and d.active = 1\n;'

Output 2:

'select \ts.id \t,s.src_cnt \t,s.out_file\t from \tkpi_index_ros.composites s \t,kpi_index_ros.kpi_index_rosoards d where \t1 = 1 \tand s.kpi_index_rosoard_id (+) = d.id \tand d.active = 1 ;'

It seems that It will either replace tabs OR newlines but not both. Is there any way to accomplish this in an efficient manner?


Solution

  • Making the following change (replacing allSQL value with the output of .replace method of String object) produces the desired output:

    NON_PRINTABLE = """\r\n\t"""
    def parseSQLFile(filename):
        with open(filename, 'r') as sqlFile:
            allSQL = sqlFile.read()
            # replace specific control characters with spaces to prevent sql compiler errors
            for char in NON_PRINTABLE:
                allSQL = allSQL.replace(char,' ') #updating allSQL with returned value
    
        return allSQL
    

    output:

    'select  s.id  ,s.src_cnt  ,s.out_file  from  kpi_index_ros.composites s  ,kpi_index_ros.kpi_index_rosoards d where  1 = 1  and s.kpi_index_rosoard_id (+) = d.id  and d.active = 1 ;'
    

    As of the second part of your question - regarding efficiency of such approach, you should probably refer to benchmark results in this answer.