Search code examples
javastringclassificationwekaarff

String array attribute in weka


I need a weka training file (arff) to have a name (String) and an array of Strings associated with that name, such that the classifier associates the name with those strings when I run it on any text. For this task, how do I make an attribute in weka that is a String array? Or is there any alternative way to do this?

(I'm using Naive Bayes Classifier)

For example: Deepika Shah, Voracious reader, funny, pretty

So if I have a sentence with any of the strings given above with Deepika Shah, it should classify the sentence as being about Deepika Shah.

EDIT: I need to classify a sentence as being about a name, using the words and phrases in the sentence. So I'm giving a set of Strings that are associated with a name and what name they are associated with. The classifier should find the class from the sentence. Or alternatively, after I extract features from the sentence (Assume I have extracted features).


Solution

  • Your arff file need to be in this format:

    @Relation testRelation
    
    @attribute firstAtr string
    @attribute secondAtr string
    @attribute thirdAtr string
    @attribute yourClass {Deepika Shah, secondClass, ...other classes listed here}
    
    @data
    "Voracious reader","funny"," pretty",Deepika Shah
    
    ...more data here
    

    Then you can import your arff file in weka.

    Now you need to trasform String values to numbers. To do that you have to use the weka->unsupervised->attribute->StringToWordVector filter located in Filter section at preprocess tab. You can click on the filter to tune parametres like term representation (tf,tf-idf) ,stopwords, stemmer algorithms, n-grams etc. Then you click apply.

    After this process is finished you are ready to proceed to the classify tab and continue with the classification.You can select your classifier and you are good to go.

    Note: You need to select the nominal class (Nom)yourClass(located under test options) to get the start button clickable.

    Note2: if your sting attributes are fixed values like funny,sad,neutral etc you can use nominal attributes instead of strings


    p.s a nice example incorporating all the above can be found here: https://www.youtube.com/watch?v=jSZ9jQy1sfE