Unable to determine structure as arff when using utf-8 arff file in Weka

I face an issue when i try to open a arff file with Weka.

When the encoding of arff file is set to ANSI everything seems to work well. But when i set the encoding to utf-8 (which is what my data require) i get the following error:

Unable to determine structure as arff(Reason java.io.Exception: keyword @relation expected,read token[@relation], line 1).

my arff file seems to be properly formatted.

@relation myrelation

@attribute pagename string
@attribute pagetext string
@attribute pagecategory string
@attribute pageclass {0,1,2,3,4,5,6,7,8,9,10}

@data
.......

note: I also changed the file encoding to utf-8 in RunWeka.ini file

Solution

As the error mentions line 1, I have the suspicion the UTF-8 file is written with a BOM at the start of the file. This unneeded zero-width space is used by Notepad under Windows to distinghuish an ANSI text file from a UTF-8 text file.

Create the file without BOM, U+FEFF. This can be done by a programmer's editor (JEdit, Notepad++), some hex editor, or you could delete the first line and re-type it. Check the file size.

Many parsers do not expect such a BOM, do not consider it whitespace, and hang.

Path path = Paths.get("...");
String s = new String(Files.readAllBytes(path), StandardCharsets.UTF_8);
String t = s.replaceFirst("^\uFEFF", "");
if (!s.equals(t)) {
    System.out.println("BOM character present in UTF-8 text");
    Files.write(path, t.getBytes(StandardCharsets.UTF_8)); // Replaces file!
}