Search code examples
pythonsqlalchemydaskdask-distributeddask-dataframe

Loading Dask dataframe with SQLAlchemy fails


I'm trying to load a Dask dataframe with SQLAlchemy using dd.read_sql_query. I define a table with one of the columns balance_date type DateTime (in the database is type DATE):

class test_loans(Base):
      __tablename__ = 'test_loans'
      annual_income = Column(Float)
      balance = Column(Float)
      balance_date = Column(DateTime)  # the type of the column is DateTime
      cust_segment = Column(String)
      total_amount_paid = Column(Float)
      the_key = Column(Integer)
      __table_args__ = (PrimaryKeyConstraint(the_key),)

Problem is that the dd.read_sql_query fails, as it says that the col_index is not type numeric or date but object:

stmt = select([ test_loans.balance_date, test_loans.total_amount_paid ]) 
ddf = dd.read_sql_query(stmt, con=con, index_col='balance_date', npartitions=3)

I get

TypeError: Provided index column is of type "object".  If divisions is
not provided the index column type must be numeric or datetime.

How to fix this? Is this a defect?


Solution

  • The problem is solved by casting the column as DateTime in the SQLAlchemy select statement.