I could not find a question that resembles this and had no luck at Quora and Reddit. I would really appreciate being pointed in the right direction as I am really out of my depth.
I want to connect my laptop to a remote server. Next, I would like to run a cx_Oracle
query written in Python on the remote server to pull data from an Oracle database stored in another remote server.
For example:
import cx_Oracle
import pandas as pd
conn = cx_Oracle.connect(user="ABC", password="DEF", ip=...) # Pretend this works
query_string = """SELECT Col1, Col2, Col3
FROM TABLE_1
WHERE [Col1] > [Col2]
AND [Col3] <> 99""")
df = pd.read_sql(query_string, myconnection)
TABLE_1 is big (over 100GB). After applying the WHERE
statements, the result goes down to around 3 GB.
If I execute this query on the first server to pull data from the second server, which server's resources would be used when running the script? In particular, would the first server ever attempt to load TABLE_1
in its RAM or would it only store the filtered result in df
?
Any query you execute will run on the database server. Applying where clauses to reduce the amount of data will use greater resources on the database server, but will reduce the amount of data that must be transferred to the client (the computer running your Python program). So depending on the where clauses (are indices used to reduce the load on the database server?) and the speed of the network and the capability of the database server and the client you can decide where to put the load. Generally, however, it makes sense to reduce the amount of data that the client needs to process and the server needs to transfer to the client.