Search code examples
pythonoracle-databaseoracle11glarge-data

Python - Download a 20 gb of dataset/datadump from Oracle / Netezza server to my local disk drive using Python


want help for the following task :

I want to download a 20gb of dataset/datadump from Oracle server (oracle 11g database) to my local disk drive (i.e. E:/python/). I want to achieve this using Python 3.4 (windows64 bit ; I'm using Anaconda - spyder IDE)

I normaly use SAS for the task using following query:

LIBNAME ORACLE ODBC DSN= oracle UID= user PWD= password; #CONNECTION TO SERVER
LIBNAME LOCAL "E:/PYTHON"; #SETTING LOACAL PATH FOR DATA STORE

CREATE TABLE LOCAL.MYnewTable AS
SELECT * FROM ORACLE.DOWLOAD_TABLE
;QUIT;

Above query will download 20GB of datadump from the server to my local E:/ drive using SAS. How to do the same in Python?? My RAM is only 4gb so downloading entire 20gb dataset in the Pandas' data frame will eat up the RAM (i believe!! I may be wrong). SAS does this task very easily. Please suggest the query for Python. Request you all to share the code.

Thanks!!


Solution

  • Okey! So have gotten solution to my own question: This can be done using cxOracle as well, but i am using Python 3.5 and apparently cxOracle for python 3.5 is not available (to my knowledge) and that is why i have used "pyodbc" package

    import csv
    import pyodbc
    conn = pyodbc.connect('''DRIVER=<<name of server connection in ODBC driver>>;
                      SERVER= <<server IP>> i.e.: <<00.00.00.00>>;
                      PORT= <<5000>>;
                      DATABASE=<<Server database name>>;
                      UID= <<xyz>>;
                      PWD= <<****>>;''')
    
    
    # needs to be at the top of your module
    
    def ResultIter(cursor, arraysize=1000):
    'An iterator that uses fetchmany to keep memory usage down'
        while True:
            results = cursor.fetchmany(arraysize)
            if not results:
               break
            for result in results:
               yield result
    
    
    # where con is a DB-API 2.0 database connection object
    cursor = conn.cursor()
    cursor.execute('select * from <<table_name>>')
    
    
    csvFile = open('stored_data_fetched_file.csv', 'a')
    csvWriter = csv.writer(csvFile)
    
    for result in ResultIter(cursor):
        csvWriter.writerow(result)
    
    csvFile.close()
    

    This can be used for Netezza connection as well. Have tried and tested.