I have the following code:
ArrayList<Attribute> attributes = new ArrayList<>()
attributes.add(new Attribute("tweet", true))
ArrayList<String> theLines = new ArrayList<>()
File cleanestTweets = new File("cleanestTweets.txt")
File savedResults = new File("savedResults.arff")
Instances instances
try {
Scanner console = new Scanner(cleanestTweets)
while (console.hasNextLine()) {
String line = console.nextLine()
theLines.add(theLine)
}
Instance ins = new DenseInstance(1)
instances = new Instances("TwitterData", attributes, theLines.size())
theLines.each { it ->
ins.setValue(attributes[0], it)
instances.add(ins)
}
StringToWordVector filter = new StringToWordVector()
filter.setInputFormat(instances)
filter.setOutputWordCounts(true)
filter.setTFTransform(true)
filter.setDictionaryFileToSaveTo(savedResults)
filter.getDictionaryFileToSaveTo()
} catch (IOException e) {
}
The code which creates the instances works fine. I am then trying to create a TDM and write this out to the savedResults.txt. When running the code, there is nothing being written to the savedResults.txt. I'm not entirely sure why. I've read the documentation but it doesn't mention anything.
StringToWordVector filter = new StringToWordVector()
filter.setInputFormat(instances)
filter.setDictionaryFileToSaveTo(savedResults)
filter.setOutputWordCounts(true)
filter.setTFTransform(true)
Instances dataFiltered = weka.filters.Filter.useFilter(instances, filter)
This does write out the words and their occurrences into the file. Looks like you have to create new Instances and explicitly state to use the filter. I used this question to come to this conclusion.