I have an arff file containing some sentences (Persian language) and a word in front of each sentence which shows its class in @data part. I need to use smo for classification. The questions:
1) Is it necessary to change the sentences to vectors ?
2) I selected "string to word vector", but the smo is inactive and still doesn't work. (and of course other algorithms like naive bayes).
How can I use this text data with smo ?
The above picture is a very small sample file.
file sample: https://www.dropbox.com/s/ohpyortve8jbwhe/shoor.arff?dl=0
First, you need apply "string to word vector" filter. After, on classify tab, you need to change the target class to "(Nom) class". This is enought to enable the naive bayes and SVM algorithms. I downloaded the dataset, and it worked well.
You can follow this tutorial: https://www.youtube.com/watch?v=zlVJ2_N_Olo
Hope it can help you
from sklearn.feature_extraction.text import TfidfVectorizer
import arff
from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
data=list(arff.load('shoor.arff'))
text=[]
label=[]
for r in data:
if (len(r)>1):
text.append(r[0])
label.append(r[1])
tfidf = TfidfVectorizer().fit_transform(text)
features = (tfidf * tfidf.T).A
X_train, X_test, y_train, y_test = train_test_split(features, label, test_size=0.5, random_state=0)
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
clf.score(X_test, y_test)
1.0