Search code examples
pythonsqlalchemyimpalakudu

Is it possible to create a Kudu table using SQLAlchemy?


I have a database in Impala, and I need to use Kudu tables in it. I'd like to use SQLAlchemy in my Python code to interact with the database. Although Impala is not a supported dialect in SQLALchemy, I have been able to access data and do basic modifications in my existing Kudu and non-Kudu tables.

I've read this question and its answers and also the GitHub page linked there, but they are about connecting to an Impala db, which I managed to do.

(I have also considered using the Kudu Python Client instead of SQLAlchemy, but that doesn't work because my code will run on Ubuntu 18.04 and it is not supported.)

My question is specifically about the creation of Kudu tables using SQLAlchemy. When I use plain SQL, it looks like this:

CREATE TABLE my_table (
    id INT,
    message STRING,
    PRIMARY KEY (id)
  )
  STORED AS KUDU

There are two things that indicate that this is a Kudu table: the presence of a primary key, which doesn't exist in Impala, and of course the last row.

Marking a column as primary key is not a problem in SQLAlchemy. But is there a way to include STORED AS KUDU in a table creation call?


Solution

  • Kudu support exists but is buggy (as of 2021 Oct 1)

    First: Impala dialect of SQLAlchemy is implemented, but in the impyla package.

    Impyla is supposed to support Kudu as well, see this issue and the commit related to it. However, starting from release 0.17a8, support for primary key constrains was omitted from the package, which crippled the Kudu support as Kudu tables need to have at least one primary key.

    So to answer the question, creating Kudu tables is possible using impyla 0.17a7, but not with any later releases - again, as of October 2021. I filed this issue about the matter.

    Feature usage

    Kudu support was implemented by adding two keyword arguments to the Table() constructor, namely impala_stored_as and impala_partition_by. Usage is obvious when taking the imperative approach and calling Table() directly. However, the arguments can easily be passed with the declarative approach too, using __table_args__ (which also works with mixins).

    For an explanation of the imperative and declarative mappings see this page of the SQLAlchemy documentation. For working examples, see the next section.

    Examples

    An imperative example, taken from here, but with fixed parameter name and primary key:

    >>> import sqlalchemy
    >>> engine = sqlalchemy.create_engine('impala://node-1.cluster')
    >>> table = sqlalchemy.Table('sweet_new_table', sqlalchemy.MetaData(), sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True), impala_stored_as='KUDU')
    >>> print(str(sqlalchemy.schema.CreateTable(table, bind=engine)))
    
    CREATE TABLE sweet_new_table (
        id INTEGER,
        PRIMARY KEY (id)
    )
    STORED AS KUDU
    

    My working solution for a declarative (ORM) example:

    import sqlalchemy
    from sqlalchemy.orm import declarative_base
    from sqlalchemy import Column, String
    
    Base = declarative_base()
    
    class MyTable(Base):
        __tablename__ = "my_table_name"
        __table_args__ = {"impala_stored_as": "KUDU"}
    
        id = Column(String, primary_key=True)
    
    
    if __name__ == "__main__":
        engine = sqlalchemy.create_engine("impala://node-1.cluster")
        Base.metadata.create_all(engine)