Search code examples
pythoncassandradatastaxdatastax-python-driver

Python Cassandra Driver: encoding issue during insertion


I'm developing a simple python module that reads data from a tsv file and load them into a Cassandra keyspace table.

I started by looking at the examples given by Datastax and everything seemed to be ok, so at that point I began to code.

The program reads data from the tsv file correctly, it translates them into a list of rows and I verified that every element of each row has the right type for the destination column. But when I try to insert a raw into a table the terminal says:

AttributeError: 'float' object has no attribute 'encode'

This is the code:

#Upload data to Cassandra DB (cassandra_df is a Pandas dataframe)
session.set_keyspace(data_ks)
cassandra_df_list = cassandra_df.values.tolist()

query = "INSERT INTO table_str (rowid,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an,ao,ap,aq,ar,as,at,au,av,aw,ax,ay,az,ba,bb,bc,bd) VALUES (uuid(),?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
prepared = session.prepare(query)

for row in cassandra_df_list:

    prepared.bind(row)
    session.execute(prepared)

cluster.shutdown()

I made a lot of changes in order to solve the problem, but I got new issues or the same with 'int' instead of 'float'. I also read other questions here and tried to use str(row) and repr(row) in prepared.bind(), but I got other errors.

I'm new to Python and I'm not able to find other solutions, what would you do?

Thanks in advance!

Edit Sorry, I forgot to give details about the DB table. Here is the creation statement:

CREATE TABLE prova.table_str (
rowid uuid PRIMARY KEY,
a text,
aa text,
ab text,
ac text,
ad text,
ae text,
af text,
ag text,
ah text,
ai text,
aj double,
ak double,
al double,
am text,
an double,
ao double,
ap double,
aq double,
ar double,
as double,
at double,
au double,
av double,
aw double,
ax double,
ay double,
az double,
b text,
ba double,
bb text,
bc text,
bd text,
c text,
d text,
e int,
f text,
g text,
h text,
i text,
j text,
k double,
l int,
m text,
n double,
o int,
p int,
q text,
r text,
s text,
t text,
u text,
v int,
w text,
x text,
y text,
z text

)


Solution

  • You didn't share your schema or a stack trace, but I'll guess that the dataframe has numeric types, and your Cassandra table has a bunch of string columns. I'll outline three possible resolutions:

    1.) Make the table types match your data so the bind encoding works.

    2.) Convert your parameters to the same types as your schema. For example, if they're all strings:

    prepared.bind(str(c) for c in row)
    

    3.) Use simple statements instead of preparing. In this case you would replace the ? bind markers with %s and let the driver use string interpolation of the parameters.

    query = "INSERT INTO table_str (rowid,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an,ao,ap,aq,ar,as,at,au,av,aw,ax,ay,az,ba,bb,bc,bd) VALUES (uuid(),%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
    for row in cassandra_df_list:
        session.execute(query, row)