I have a database in Impala, and I need to use Kudu tables in it. I'd like to use SQLAlchemy in my Python code to interact with the database. Although Impala is not a supported dialect in SQLALchemy, I have been able to access data and do basic modifications in my existing Kudu and non-Kudu tables.
I've read this question and its answers and also the GitHub page linked there, but they are about connecting to an Impala db, which I managed to do.
(I have also considered using the Kudu Python Client instead of SQLAlchemy, but that doesn't work because my code will run on Ubuntu 18.04 and it is not supported.)
My question is specifically about the creation of Kudu tables using SQLAlchemy. When I use plain SQL, it looks like this:
CREATE TABLE my_table (
id INT,
message STRING,
PRIMARY KEY (id)
)
STORED AS KUDU
There are two things that indicate that this is a Kudu table: the presence of a primary key, which doesn't exist in Impala, and of course the last row.
Marking a column as primary key is not a problem in SQLAlchemy. But is there a way to include STORED AS KUDU
in a table creation call?
First: Impala dialect of SQLAlchemy is implemented, but in the impyla package.
Impyla is supposed to support Kudu as well, see this issue and the commit related to it. However, starting from release 0.17a8
, support for primary key constrains was omitted from the package, which crippled the Kudu support as Kudu tables need to have at least one primary key.
So to answer the question, creating Kudu tables is possible using impyla 0.17a7
, but not with any later releases - again, as of October 2021. I filed this issue about the matter.
Kudu support was implemented by adding two keyword arguments to the Table()
constructor, namely impala_stored_as
and impala_partition_by
. Usage is obvious when taking the imperative approach and calling Table()
directly. However, the arguments can easily be passed with the declarative approach too, using __table_args__
(which also works with mixins).
For an explanation of the imperative and declarative mappings see this page of the SQLAlchemy documentation. For working examples, see the next section.
An imperative example, taken from here, but with fixed parameter name and primary key:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('impala://node-1.cluster')
>>> table = sqlalchemy.Table('sweet_new_table', sqlalchemy.MetaData(), sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True), impala_stored_as='KUDU')
>>> print(str(sqlalchemy.schema.CreateTable(table, bind=engine)))
CREATE TABLE sweet_new_table (
id INTEGER,
PRIMARY KEY (id)
)
STORED AS KUDU
My working solution for a declarative (ORM) example:
import sqlalchemy
from sqlalchemy.orm import declarative_base
from sqlalchemy import Column, String
Base = declarative_base()
class MyTable(Base):
__tablename__ = "my_table_name"
__table_args__ = {"impala_stored_as": "KUDU"}
id = Column(String, primary_key=True)
if __name__ == "__main__":
engine = sqlalchemy.create_engine("impala://node-1.cluster")
Base.metadata.create_all(engine)