Search code examples
wekafeature-extractionarffaudio-analysisaudeering-opensmile

opensmile in Python save as .arff


I am using Python with the library opensmile. My target is to generate *.arff files to use in Weka3 ML-Tool. My problem is, that It is rather unclear for me how to save the extracted features into an *.arff file.

for example:

import opensmile

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals,
)
y = smile.process_file('audio.wav')

//ToDo save y in arff

I should be possible since there are questions about the generated files eg:here. However I can't find anything specific about that.


Solution

  • Instead of generating ARFF directly, you could generate a CSV file in a format that Weka can load it:

    import csv
    import pandas as pd
    import opensmile
    
    # configure feature generation
    smile = opensmile.Smile(
        feature_set=opensmile.FeatureSet.ComParE_2016,
        feature_level=opensmile.FeatureLevel.Functionals,
    )
    
    # the audio files to generate features from
    audio_files = [
        '000000.wav',
        '000001.wav',
        '000002.wav',
    ]
    
    # generate features
    ys = []
    for audio_file in audio_files:
        y = smile.process_file(audio_file)
        ys.append(y)
    
    # combine features and save as CSV
    data = pd.concat(ys)
    data.to_csv('audio.csv', quotechar='\'', quoting=csv.QUOTE_NONNUMERIC)
    

    As a second (and optional) step, convert the CSV file to ARFF using the CSVLoader class from the command-line:

    java -cp weka.jar weka.core.converters.CSVLoader audio.csv > audio.arff
    

    NB: You will need to adjust the paths to audio files, weka.jar, CSV and ARFF file to fit your environment, of course.