Ok, let's save someone 8 hours of clueless debugging.
TL;DR: Apache drill cannot correctly parse CSV files generated on windows machines. That's because their EOL is set to \r\n by default unlike to unix system, where it is set to \n. And this leads to horribly undebuggable errors because the leading \r probably stays clued to the last field's value. And what's funny, you won't notice this because it's invisible.
Let's have two files, one created in linux and the second in windows: hello.linux.csv and hello.win.csv. The content is the same (at least it looks like it is ...)
field_a,field_b
Hello,0.5
Let's have a query.
SELECT * from (...)/hello.linux.csv;
---
field_a, field_b
Hello, "0.5"
SELECT * from (...)/hello.win.csv;
---
field_a, field_b
Hello, "0.5"
Fine! Let's do something with the data. Cast "0.5" to number should be fine (and necessary).
SELECT
field_a, CAST (field_b as DECIMAL(10, 2)) as test
from (...)/hello.linux.csv;
---
field_a, test
Hello, 0.5
-- ... aaand, here we go!
SELECT
field_a, CAST (field_b as DECIMAL(10, 2)) as test
from (...)/hello.win.csv;
[30038]Query execution error. Details:[
SYSTEM ERROR: NumberFormatException
Fragment 0:0
Please, refer to logs for more information. -- In the logs, there is only useless java stacktrace, of course.
[Error Id: 3551c939-3f5b-42c1-9b58-d600da5f12a0 on drill-develop-7bdb45c597-52rnz:31010]
]
...
(And now, imagine how much time would take to reveal this on a complex production setup where the queries, data and other factors are somehow more complicated.)
You can update csv format line delimiter to \r\n
but this would apply to all csv files in the scope of your text plugin. To change delimiter per table use table function.