Search code examples
csvunicodeapache-drill

Apache drill losing unicode in TSVs


I'm using the text/tsv storage plugin with Apache drill and the output tsv files have ? for unicode characters. If I use the JSON storage plugin, the unicode is fine.

Something like:

URL: http://localhost:8047/query.json

Payload:

{
  "queryType":"SQL",
  "query": "CREATE TABLE st.`repo`.`test` AS SELECT * FROM st.`repo`.`unicode_data`"
}

Solution

  • Set the JVM file encoding and this is fixed.

    JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8