I have a dataframe as below:
df
date time open high low last
01-01-2017 11:00:00 37 45 36 42
01-01-2017 11:23:00 36 43 33 38
01-01-2017 12:00:00 45 55 35 43
....
I want to write it into cassandra. It's kind of bulk upload after processing on data in python.
The schema for cassandra is as below:
CREATE TABLE ks.table1(date text, time text, open float, high float, low
float, last float, PRIMARY KEY(date, time))
To insert single row into cassandra we can use cassandra-driver in python but I couldn't find any details about uploading an entire dataframe.
from cassandra.cluster import Cluster
session.execute(
"""
INSERT INTO ks.table1 (date,time,open,high,low,last)
VALUES (01-01-2017, 11:00:00, 37, 45, 36, 42)
""")
P.S: The similar question have been asked earlier, but doesn't have answer to my question.
Even i was facing this problem but i figured out that even while uploading Millions of rows(19 Million to be exact) into Cassandra its didn't take much time.
Coming to your problem,you can use cassandra Bulk LOADER to get your job done.
EDIT 1:
You can use prepared statements to help uplaod data into cassandra table while iterating through the dataFrame.
from cassandra.cluster import Cluster
cluster = Cluster(ip_address)
session = cluster.connect(keyspace_name)
query = "INSERT INTO data(date,time,open,high,low,last) VALUES (?,?,?,?,?,?)"
prepared = session.prepare(query)
"?" is used to input variables
for item in dataFrame:
session.execute(prepared, (item.date_value,item.time_value,item.open_value,item.high_value,item.low_value,item.last_value))
for item in dataFrame:
session.execute(prepared, (item[0],item[1],item[2],item[3],item[4],item[5]))
What i mean is that use for loop to extract data and upload using session.execute().
for more info on prepared statements
Hope this helps..