Search code examples
csvnullapache-drill

How to make Apache Drill understand explicit `NULL` in a csv file?


Assume I have this CSV file:

a,b
NULL,1
1,2

Then I try doing some queries:

> SELECT * FROM dfs.`/path/to/foo.csv`;
+-------+----+
|   a   | b  |
+-------+----+
| NULL  | 1  |
| 1     | 2  |
+-------+----+
2 rows selected

> SELECT * FROM dfs.`/path/to/foo.csv` WHERE a is NULL;
+----+----+
| a  | b  |
+----+----+
+----+----+
No rows selected

> SELECT * FROM dfs.`/path/to/foo.csv` WHERE a is NOT NULL;
+-------+----+
|   a   | b  |
+-------+----+
| NULL  | 1  |
| 1     | 2  |
+-------+----+
2 rows selected

> SELECT * FROM dfs.`/path/to/foo.csv` WHERE a > 0;
Error: SYSTEM ERROR: NumberFormatException: NULL

Fragment 0:0

It seems Apache Drill is considering NULL as a string.

Even if I replace it with an empty string (sed -i 's/NULL//g' foo.csv), Apache Drill still thinks those values are not null.

Is there any way to make it consider NULL (or empty value) as… well… null?

I'm using the latest release version: 1.9.0.


Solution

  • Absolutely you can!

    1. Open up the console at http://[your drillbit ip]:8047 and click on the "Options" button on the top right hand corner.
    2. Find "drill.exec.functions.cast_empty_string_to_null" and set it to true.