Search code examples
windowsdatasetweka

Weka UI language configuration error while reading file


in attempts to implement Machine learning into my project, i used WEKA. And to train and test it, weka process collection of data which is in Russian Language. But in process of reading it shows unreadible ('ЧÑ, о Ñ') characters. I understand that this is due to language configuration error, but i cant find a solution. Any help is apperciated

WEKA UI screenshot

i gave java 1.8, weka 3.8. my dataset is like: "Российский ситком (ситуационная комедия) «Интерны», совмещенная адаптация «Клиники» и «Доктора Хауса»" my folder is like:

-kino1tr: -good -bad -neutral


Solution

  • i did stupid mistake. While loading data, there charSet field to specify language configuration. Thus, stating UTF-8 in charset resolves the issue