Search code examples
javautf-8weka

Weka - load UTF-8 encoded csv


Is there a way in Weka 3.7.13 to load UTF-8 encoded files without converting them to ANSII?

I am trying to load a csv file containing a string attribute, whose value can contain emoticons, and I need not to lose them.


Solution

  • It is very possible to do this. See this link, it describes how to do this from command line or GUI.

    Add this parameter if using the command line -Dfile.encoding=utf-8.

    If using the GUI then edit the RunWEKA.ini file. Change the fileEncoding placeholder to utf-8.