When processing files from sources, one quite often converts these to UTF-8 csvs for sources that have more or less exotic character sets.
If using sqoop to access a database, how does that work? Do not see a conversion clause but do note the HDFS is UTF-8 by default. Automatic? I heard - but could not confirm - that sqoop converts standardly to UTF-8.
This is correct?
Yes, this is so when executing actual tests taken into account.