In the past I wrote a pl/sql script that takes a table name and a column name (that indicates source) as arguments and then profiles all the columns in the table giving useful counts.
I'm currently teaching myself python and am re-writing that pl/sql script in a way that can be executed against other sql databases, not just oracle. So I am new to Python. I'm going through Automate the Boring Stuff on Udemy. At the moment I'm not concerned with sql injection because I'm just learning the Python language. I have left out the create table statements to reduce the amount of code I'm pasting.
The script is inserting the correct records on the first pass of the loop however it does not start the 2nd loop. Here is the IDLE output, and then the code.
================================================ RESTART: C:\Users\nathan\Documents\_work\_data_profiling_script\profiling_python_tester.py ================================================
('ETL_INS_DTM',)
insert into PROFILING_NWS6_PRT
select 'PROFILING_NWS6', 'ETL_INS_DTM', SRCRECNO, count(*), null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null
from PROFILING_NWS6
group by SRCRECNO
order by 1,2,3
executed
committed
**Traceback (most recent call last):
File "C:\Users\nathan\Documents\_work\_data_profiling_script\profiling_python_tester.py", line 39, in <module>
for row in cursor:
cx_Oracle.InterfaceError: not a query**
import cx_Oracle
conn = cx_Oracle.connect("system", "XXXX", "localhost/xe")
cursor = conn.cursor()
## parameter declaration
##########################################################################
# These 2 parameters populated by user
v_st = 'PROFILING_NWS6' # Source Table - table in which we are profiling the data
v_srcno = 'SRCRECNO' # Source Number - numeric column in v_st that identifies the source system
# These 3 parameters automatically populated
v_prt = v_st + '_PRT' # Profile Report Table - table name we want our report created as
v_log = v_st + '_LOG' # Log Table - script logging goes here, used for monitoring and debugging
v_top = v_st + '_TOP' # Top Table - temporary table to hold top 5 counts
# write script that populates Profile Report Table with rows for each source/column combination from source table
# these are required to join to when updating analysis fields
##########################################################################
sql = "Select column_name from user_tab_columns where table_name = '"+ v_st + "' and column_name <> '" + v_srcno + "'"
cursor.execute(sql)
for row in cursor:
print(row)
sql = """insert into {x_prt}
select '{x_st}', '{x_row}', {x_srcno}, count(*), null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null
from {x_st}
group by {x_srcno}
order by 1,2,3""".format(x_prt = v_prt, x_srcno = v_srcno, x_st = v_st, x_row = row[0])
print(sql)
cursor.execute(sql)
print('executed')
cursor.execute('commit')
print('committed')
#close connections
##########################################################################
cursor.close()
conn.close()
The cursor in for row in cursor:
is still in use until the loop completes. When you do cursor.execute(sql)
inside the loop, it changes the cursor
object. So on the 2nd loop, the cursor
item being iterated is the cursor from the commit
inside the loop. Solution is to create or use a different cursor object inside the loop.
cursor = conn.cursor() # original cursor, as above
insert_cursor = conn.cursor() # new one for insert
sql = "Select column_name from user_tab_columns where table_name " # etc
for row in cursor.execute(sql):
print(row)
sql = """second sql""".format(...)
print(sql)
insert_cursor.execute(sql)
print('executed')
insert_cursor.execute('commit')
print('committed')
cursor.close()
insert_cursor.close()
conn.close()
Also, for row in cursor:
should be for row in cursor.fetchall():
. Or just
for row in cur.execute(sql):
.