I need to iteratively extend a weka ARFF file with SparseInstance objects. Each time a new SparseInstance is added the header might change since the new Instance might add additional attributes. I thought the mergeInstances method would solve my problem but it does not. It requires both dataset to have no shared attributes.
If this is not absolutely clear look at the following example:
Dataset1
a b c
1 2 3
4 5 6
Dataset2
c d
7 8
Merged result:
a b c d
1 2 3 ?
4 5 6 ?
? ? 7 8
The only solution I see at the moment is parsing the arff file by hand and merging it using String processing. Does anyone know of a better solution?
Ok. I found the solution myself. The central part of the solution is the method Instances#insertAttributeAt
, which inserts a new attribute as the last one if the second parameter is model.numAttributes()
. Here is some example code for numerical attributes. It is easy to adapt to other types of attributes as well:
Map<String,String> currentInstanceFeatures = currentInstance.getFeatures();
Instances model = null;
try {
if (targetFile.exists()) {
FileReader in = new FileReader(targetFile);
try {
BufferedReader reader = new BufferedReader(in);
ArffReader arff = new ArffReader(reader);
model = arff.getData();
} finally {
IOUtils.closeQuietly(in);
}
} else {
FastVector schema = new FastVector();
model = new Instances("model", schema, 1);
}
Instance newInstance = new SparseInstance(0);
newInstance.setDataset(model);
for(Map.Entry<String,String> feature:currentInstanceFeatures.entrySet()) {
Attribute attribute = model.attribute(feature.getKey());
if (attribute == null) {
attribute = new Attribute(feature.getKey());
model.insertAttributeAt(attribute, model.numAttributes());
attribute = model.attribute(feature.getKey());
}
newInstance.setValue(attribute, feature.getValue());
}
model.add(newInstance);
model.compactify();
ArffSaver saver = new ArffSaver();
saver.setInstances(model);
saver.setFile(targetFile);
LOGGER.debug("Saving dataset to: " + targetFile.getAbsoluteFile());
saver.writeBatch();
} catch (IOException e) {
throw new IllegalArgumentException(e);
}