Search code examples
rmachine-learningwekaarff

WEKA prediction with multiple ARFF files


I'm fairly new to WEKA and ARFF files and I'm currently working with its GUI. Something I'm confused at is how do I do my prediction (classification) with multiple ARFF files?

For example, file A has 3 attributes, "ID", "attribute_1", "attribute_2" ; while file B has 2 attributes, "ID" , "Scores" (Main attribute used for prediction).

The problem is, each line of data in file A are unique, but the data in B are repetitive. Both files are related by their "ID". In other words, file B stores a set of "scores" for each element at file A.

Is there any suggestion on how I could join both file A & B together? Or is there any way that I could work around WEKA to make it work?


Solution

  • Weka needs one "flattened" table, i.e., arff-file. This process is also called denormalization. There’s a weka package (Denormalize) which contains a filter to perform this operation.

    There is an example how transactional data can be flattened here: https://weka.wikispaces.com/How+can+I+use+transactional+data+in+Weka%3F

    Before using the filter you would have to merge your two files together. If you have csv-files or something similar you could achieve this by means of Excel, see for example:

    https://superuser.com/questions/420635/how-do-i-join-two-worksheets-in-excel-as-i-would-in-sql