Search code examples
pythonpandascassandra

How to insert Pandas DataFrame into Cassandra?


I have a dataframe as below:

df

date        time       open   high   low   last
01-01-2017  11:00:00   37      45     36    42
01-01-2017  11:23:00   36      43     33    38
01-01-2017  12:00:00   45      55     35    43

....

I want to write it into cassandra. It's kind of bulk upload after processing on data in python.

The schema for cassandra is as below:

CREATE TABLE ks.table1(date text, time text, open float, high float, low 
                       float, last float, PRIMARY KEY(date, time))

To insert single row into cassandra we can use cassandra-driver in python but I couldn't find any details about uploading an entire dataframe.

from cassandra.cluster import Cluster

session.execute(
    """
    INSERT INTO ks.table1 (date,time,open,high,low,last)
    VALUES (01-01-2017, 11:00:00, 37, 45, 36, 42)
    """)

P.S: The similar question have been asked earlier, but doesn't have answer to my question.


Solution

  • Even i was facing this problem but i figured out that even while uploading Millions of rows(19 Million to be exact) into Cassandra its didn't take much time.

    Coming to your problem,you can use cassandra Bulk LOADER to get your job done.

    EDIT 1:

    You can use prepared statements to help uplaod data into cassandra table while iterating through the dataFrame.

        from cassandra.cluster import Cluster
        cluster = Cluster(ip_address)
        session = cluster.connect(keyspace_name)
        query = "INSERT INTO data(date,time,open,high,low,last) VALUES (?,?,?,?,?,?)"
        prepared = session.prepare(query)
    

    "?" is used to input variables

        for item in dataFrame:
            session.execute(prepared, (item.date_value,item.time_value,item.open_value,item.high_value,item.low_value,item.last_value))
    

    or

        for item in dataFrame:
            session.execute(prepared, (item[0],item[1],item[2],item[3],item[4],item[5]))
    

    What i mean is that use for loop to extract data and upload using session.execute().

    for more info on prepared statements

    Hope this helps..