Search code examples
pythonsql-server-2012sqlalchemyunixodbc

Why is my VARCHAR truncated to 255 characters? How do I fix this?


Setup:

I am using the following components:

  • unixODBC 2.3.4
  • FreeTDS 1.12
  • Python 3.6
  • SQLAlchemy

...against a MSSQL Server 2014.

The problem:

Suppose I have a table consisting of only 2 columns:

  • id (primary key, int)
  • my_text (VARCHAR())

My SQLAlchemy Model looks like this:

from sqlalchemy.dialects.mssql.base import VARCHAR

MyText(Base):
    id = Column(Integer, primary_key=True)
    my_text = Column(VARCHAR())

When I try to create a new text entry like so, and my text is longer than 255 characters, the string is silently truncated at 255.

my_text='REALLY LONG STRING THAT IS LONGER THAN 255. E.g.: 6000+ characters. Assume my string is 558 bp long.'
print(len(my_text))  # Gives 558 bp.
new_text = MyText(my_text=my_text)
print(len(new_text.my_text))  # Gives 558 bp.
db_s.add(new_text)
print(len(new_text.my_text))  # Gives 558 bp.
db_s.commit()
print(len(new_text.my_text))  # Gives 255 bp now after commit.

At first, I thought this is caused at write time to DB. But I found that this is caused at query time (read below).

Question:

1.) Why is this happening?

I think this has something to do with the drivers (e.g.: unixodbc 2.3.4, FreeTDS 1.12) but I am not entirely where exactly this is occurring.

2.) More importantly, how do I fix this issue?

These issues are related but different:

MySQL VARCHAR index truncated to 255 during migration

The problem above is that I am not using MYSQL. So the solution provided does not fix the issue.

unixODBC/FreeTDS results truncated to 255 character

No solution is provided in that thread.

On that note, it should be noted that my strings can be routinely longer that 6000 characters (as I am working with long DNA sequences). I would really appreciate any way to fix the truncation issue.

Update (2017-10-12):

I made some remarkable but equally puzzling discoveries since yesterday.

# Connecting via pyodbc direct connection using just some helper functions to make things more convenient.
con_str = create_connection_string(DATABASE='test')
cur = make_connection_db(connection_str=con_str)
for row in cur.execute('SELECT Text.my_text, len(Text.my_text) FROM [test].[dbo].[Text]'):
    print(row)
    print(len(row[0]))

This gives me a string that is 558 characters long (see below).

 ('ATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATG', 
 558)
 558

Now using SQL Alchemy but still direct sql statements.

# Using SqlAlchemy connection with direct SQL query.
eoi_engine = create_engine(
"mssql+pyodbc://user:somepw@db:1234/test?driver=FreeTDS")
s_con = eoi_engine.connect()
s_res = s_con.execute('SELECT Text.my_text, len(Text.my_text) FROM [test].[dbo].[Text]')
for row in s_res:
    print(row)
    print(len(row[0])

This gives me a string that it claims is 558 characters long but in reality it is only 255 characters long.

 ('ATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATG', 
 558)
 255

And last scenario:

# Using SQLAlchemy full-stack.
Session = sessionmaker()

s = Session(bind=s_con)
fs_res = s.query(DNAPtsSeq).filter().all()
for row in fs_res:
    print(row)
    print(len(row.nt_seq))

This gives me a string that is only 255 characters long.

So in summary:

  • The creation is caused not when WRITING things to the DB, but when querying things from the DB characters get silently truncated.
  • I am only getting this problem of truncation when I use SQLAlchemy operations.

I don't know why I am only getting this error when I query using SQLAlchemy. Does anyone know the reason why? How can I fix this behavior? (I know the workaround is to use direct sql queries...)


Solution

  • In my case, it turns out that there were several compounding errors, I am listing for posterity in case other people run into the same problem as I:

    1. When I initially created the table, instead of declaring the table as VARCHAR(MAX), I declared them as VARCHAR(8000).
    2. When I fixed the data table in the database, I did not change the ORM. The field was still VARCHAR(8000). Somehow this leads to a silent truncation.
    3. Even changing VARCHAR(8000) to VARCHAR() did not fix the issue for me. It turns out that MSSQL VARCHAR(MAX) is highly non-standard. An obscure reference, led me to discover that there is a different VARCHAR just for MSSQL. If you import VARCHAR from the following location:

      from sqlalchemy.dialects.mssql.base import VARCHAR
      

    ...you can then declare your text field to be VARCHAR() which equals VARCHAR(Max) in MSSQL speak.

    http://docs.sqlalchemy.org/en/latest/dialects/mssql.html

    This has increased the truncation limit to 4096 characters but not resolved it.

    I tried to decipher the following answer here:

    SQLAlchemy Truncating VARCHAR(MAX)

    Unfortuately, changing the text size buffer does not remove the truncation limit of 4096. In my case, I had to also precede queries to the DB with a sql statement from SQLAlchemy:

    db_s.execute('Set TEXTSIZE {0}'.format(SOME_BIG_NUMBER_LIKE_20000)