.csv
100387C,254,73,93
100388D,2047,60,98
100388D,2736,62,9
100389E,951,82,90
100390F,2048,91,98
100411C,254,50,96
100412D,047,75,9
.arff
@relation test
@attribute Admno {100387C,100388.0,100389E,100390.0,100411C,100412.0}
@attribute Code {254,2047,2736,951,2048,254,047}
@attribute ore numeric
@attribute tend numeric
100387C,254,73,93
100388.0,2047,60,98
100388.0,2736,62,9
100389E,951,82,90
100390.0,2048,91,98
100411C,254,50,96
100412.0,047,75,9
If you were to notice the different between this two data after converting was from D to .0 on @attribute Admno. The file conversion I was using are below. So I was wondering what went wrong on the conversion. Thanks
CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\\test.csv"));
Instances data = loader.getDataSet();
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\\test.arff"));
saver.writeBatch();
The reason you are getting 100388D
as 100388.0
and 100390F
as 100390.0
is because the values are ending with D and F respectively. In Java, this means the values are Double and Float (D stands for Double and F stands for Float). That is why when Weka is converting them into nominal values, it is believing that the values should be Double or Float and hence the .0
instead of D
and F
.
You can find a discussion here and the related documentation here.
To the best of my knowledge, there is no straight forward way to overcome this in Weka
. But if this is an ID
and does not take part into classification or clustering, then you can have the facility to ignore this attribute when you build a model based on this data and apply it on your test data.
Another way to overcome this is to change this attribute's values to some values that don't end with neither D
nor F
.