Search code examples
sql-serverdatabasedataframepysparkspark-jdbc

Pyspark - SQL server 2005 - SQL Exception 151


I'm facing an issue about fetching data from a database support on sql server2005 through pyspark. I have a table with 5 columns :

- index -> int
- category -> nvarchar
- date_modified -> datetime format YYYY-MM-DD HH:MM:SS:SSS
- category2 -> nvarchar
- category3 -> nvarchar

When I'm trying to import the data using this command

conn = spark.read.format("jdbc").option("url", "jdbc:sqlserver://IP:PORT;database=dbname;user=xxxxxxxx;password=xxxxxxxxx").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("com.microsoft.sqlserver.jdbc.sendTimeAsDatetime", "false")
conn = conn.option("dbtable", "(SELECT [index],[category],[date_modified],[category2],[category3] FROM table) AS result")
df = conn.load()

I'm getting an SQL exception : java.sqlSQLExcedption: Unrecognized SQL type -151

If I convert my column to a string, I'm able to fetch the data properly, but I would like to import it as is because I have a process that check the pattern of the date and detect discrepancies.

The strange thing is that the exact same query on another database with the same structure and same table but on SQL server 2016 is working well. I'm working with Spark version 2.3

Can someone help me out with this issue ?


Solution

  • The issue is solved. It was due to an argument that was doing a select * on my whole table. By moving this parameter I was able to extract the data properly. An upgrade of the version of the database is planned.