weka feature-extraction arff audio-analysis audeering-opensmile

opensmile in Python save as .arff

I am using Python with the library opensmile. My target is to generate *.arff files to use in Weka3 ML-Tool. My problem is, that It is rather unclear for me how to save the extracted features into an *.arff file.

for example:

import opensmile

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals,
)
y = smile.process_file('audio.wav')

//ToDo save y in arff

I should be possible since there are questions about the generated files eg:here. However I can't find anything specific about that.

Solution

Instead of generating ARFF directly, you could generate a CSV file in a format that Weka can load it:

import csv
import pandas as pd
import opensmile

# configure feature generation
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals,
)

# the audio files to generate features from
audio_files = [
    '000000.wav',
    '000001.wav',
    '000002.wav',
]

# generate features
ys = []
for audio_file in audio_files:
    y = smile.process_file(audio_file)
    ys.append(y)

# combine features and save as CSV
data = pd.concat(ys)
data.to_csv('audio.csv', quotechar='\'', quoting=csv.QUOTE_NONNUMERIC)

As a second (and optional) step, convert the CSV file to ARFF using the CSVLoader class from the command-line:

java -cp weka.jar weka.core.converters.CSVLoader audio.csv > audio.arff

NB: You will need to adjust the paths to audio files, weka.jar, CSV and ARFF file to fit your environment, of course.