Search code examples
pythonpandascx-oracle

How to covert oracle clob column to list using pandas


I'm trying to read a clob column from oracle table into a pandas dataframe. I need to convert the pandas dataframe before I pass it to a classifier as list.

The result shows - with multiple entries

I want to see a result like ['string1', string2','string3']

Please help Code snippet below -

import pandas as pd
import cx_Oracle
connection = cx_Oracle.connect('username/pass@localhost/schema')
statement = "select clob_column from table "
df = pd.read_sql(statement, con=connection)
#print(df.head())
corpus=(df['CLOB_COLUMN'].values.tolist())
print (corpus)

Solution

  • You can solve your problem using the following approach:

    import cx_Oracle
    import pandas as pd
    
    connection_string = 'username/pass@localhost/schema'
    clob_column_name = 'CLOB_COLUMN'
    query = "SELECT {} FROM table".format(clob_column_name)
    
    with cx_Oracle.connect(connection_string) as connection:
        df = pd.read_sql(sql=query, 
                         con=connection)
        # need to read data from cx_Oracle.LOB objects
        df[clob_column_name] = df[clob_column_name].apply(lambda x: x.read())
    
    print(df[clob_column_name].values.tolist())
    
    # ['string1', 'string2', 'string3']