I'm trying to load my dataframe into AstraDB but its taking forever to load.. i was wondering if there's a faster method to do it via python?
import cassandra
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import pandas as pd
cloud_config= {
'secure_connect_bundle': 'secure-connect-capstone-project.zip'
}
auth_provider = PlainTextAuthProvider(user,pass)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
#connect to keyspace_name
session = cluster.connect('iac689')
query = """insert into data_2 (truck_id, active, reading_id, start_mileage, start_time, truck_name, type)
values (%s,%s,%s,%s,%s,%s,%s)"""
for i in df.values:
session.execute(query, [i[0],i[1],i[2],i[3],i[4],i[5],i[6]])
If you really need to do this via Python, then you can speedup code by:
session.prepare
on your query string, and use it in session.execute
.execute_async
) instead of synchronous (execute
). But you need to track how many in-flight queries you have, etc. to avoid getting errors.Really, I would recommend to not re-invent the wheel, but dump data as CSV or JSON file, and use DSBulk to load data into Cassandra/Astra - this tool is heavily optimized for loading/unloading data from Cassandra/Astra.