Instances went wrong from csv to weka

 .csv
 100387C,254,73,93
 100388D,2047,60,98
 100388D,2736,62,9
 100389E,951,82,90
 100390F,2048,91,98
 100411C,254,50,96
 100412D,047,75,9

 .arff
 @relation test

 @attribute Admno {100387C,100388.0,100389E,100390.0,100411C,100412.0}
 @attribute Code {254,2047,2736,951,2048,254,047}
 @attribute ore numeric
 @attribute tend numeric
  100387C,254,73,93
  100388.0,2047,60,98
  100388.0,2736,62,9
  100389E,951,82,90
  100390.0,2048,91,98
  100411C,254,50,96
  100412.0,047,75,9

If you were to notice the different between this two data after converting was from D to .0 on @attribute Admno. The file conversion I was using are below. So I was wondering what went wrong on the conversion. Thanks

    CSVLoader loader = new CSVLoader();
    loader.setSource(new File("C:\\test.csv"));
    Instances data = loader.getDataSet();

    ArffSaver saver = new ArffSaver();
    saver.setInstances(data);
    saver.setFile(new File("C:\\test.arff"));
    saver.writeBatch();

Solution

The reason you are getting 100388D as 100388.0 and 100390F as 100390.0 is because the values are ending with D and F respectively. In Java, this means the values are Double and Float (D stands for Double and F stands for Float). That is why when Weka is converting them into nominal values, it is believing that the values should be Double or Float and hence the .0 instead of D and F.

You can find a discussion here and the related documentation here.

To the best of my knowledge, there is no straight forward way to overcome this in Weka. But if this is an ID and does not take part into classification or clustering, then you can have the facility to ignore this attribute when you build a model based on this data and apply it on your test data.

Another way to overcome this is to change this attribute's values to some values that don't end with neither D nor F.