I'm fairly new to WEKA
and ARFF
files and I'm currently working with its GUI
. Something I'm confused at is how do I do my prediction (classification) with multiple ARFF
files?
For example, file A has 3 attributes, "ID"
, "attribute_1"
, "attribute_2"
; while file B has 2 attributes, "ID"
, "Scores"
(Main attribute used for prediction).
The problem is, each line of data in file A
are unique, but the data in B
are repetitive. Both files are related by their "ID"
. In other words, file B
stores a set of "scores"
for each element at file A
.
Is there any suggestion on how I could join both file A
& B
together? Or is there any way that I could work around WEKA
to make it work?
Weka needs one "flattened" table, i.e., arff
-file. This process is also called denormalization. There’s a weka package (Denormalize) which contains a filter to perform this operation.
There is an example how transactional data can be flattened here: https://weka.wikispaces.com/How+can+I+use+transactional+data+in+Weka%3F
Before using the filter you would have to merge your two files together. If you have csv
-files or something similar you could achieve this by means of Excel, see for example:
https://superuser.com/questions/420635/how-do-i-join-two-worksheets-in-excel-as-i-would-in-sql