Search code examples
bashcsvwekafile-conversionarff

ARFF file extension to csv binary executable


Thanks in advance for the help.

I'm looking for a binary executable to convert an .arff into a .csv in a bash script. Ideally something that I could run along the lines of

#! /bin/sh
... some stuff....
converstionFunc input.arff output.csv
... some more stuff ...

Looking into writing this myself I found that weka provides a library that I could utilize that would allow me to do this. However, as much as I looked for it, I could not find it. I have weka installed on my mac and after looking around for the library I still was unable to find it.

Does anyone know where I may find such an executable, or able to point me where I could get a hold of the weka java library that would let me write it myself?


Solution

  • Clone this github repository. It contains an arff2csv tool in the "tools" subdirectory.

    arff2csv is designed to run in pipes of unix commandline tools.

    https://github.com/jeroenjanssens/data-science-at-the-command-line

    arff2csv is a one-line shell-script that calls another shell script that calls weka.jar,

    so it needs java installed on your machine; and note that arff2csv needs Weka version 3.6. (According to my experiments the newer v3.7 does not work.)

    The script wants this environment variable set:

    export WEKAPATH=/path/to/wekajar-dirname
    

    and then you can do

    cat /opt/smallapps/weka-stable/data/breast-cancer.arff | arff2csv > breast-cancer.arff.csv
    

    Large arffs need some time to get processed.

    You can read J.Janssen's book (see repo-README) for a bit more info.