Search code examples
sqlconcatenationdatabricksdatabricks-sql

Databricks SQL CONCAT function adding characters where not requested?


I'm trying to use the CONCAT function to concatenate two columns in Databricks SQL. However, when I try to use it the CONCAT functions is prepending and appending .0 to the values in the second column. For example, my data looks like this:
ColA ColB
10001 25678
18921 25678
27331 89011

My query looks like this:

`SELECT cola, colb, CONCAT(cola, colb) AS colc
FROM <parquet table in question>`

And the output I'm expecting is
ColA ColB ColC
10001 25678 1000125678
18921 25678 1892125678
27331 89011 2733189011

But instead I get this:


ColA ColB ColC
10001 25678 10001.025678.0
18921 25678 18921.025678.0
27331 89011 27331.089011.0

Why is CONCAT adding these zeroes and decimal points? I had thought it may be related to the fact that I'm working with integers, but I tested the concat function of w3schools.com using their interactive command runner and it concatenated fine on there. What have I missed about the concat function?


Solution

  • While I did not manage to solve the issue directly with SQL, I did solve it with SparkR with the help of another answer to a related question I posted (Is there a more memory efficient way to concatenate two columns in SparkR?) I think the issue was a type error. I think the two columns being concatenated were being converted to doubles, hence why they were getting decimal points added to them. The solution was to cast them as strings or ints to get rid of the decimals.