Search code examples
weka

Same Instances header ( arff ) for all my database queries


I am using InstanceQuery , SQL queries, to construct my Instances. But my query results does not come in the same order always as it is normal in SQL. Beacuse of this Instances constucted from different SQL has different headers. A simple example can be seen below. I suspect my results changes because of this behavior.

Header 1

@attribute duration numeric
@attribute protocol_type {tcp,udp}
@attribute service {http,domain_u}
@attribute flag {SF}

Header 2

@attribute duration numeric
@attribute protocol_type {tcp}
@attribute service {pm_dump,pop_2,pop_3}
@attribute flag {SF,S0,SH}

My question is : How can I give correct header information to Instance construction.

Is something like below workflow is possible?

  1. get pre-prepared header information from arff file or another place.
  2. give instance construction this header information
  3. call sql function and get Instances (header + data)

I am using following sql function to get instances from database.

public static Instances getInstanceDataFromDatabase(String pSql
                                      ,String pInstanceRelationName){
    try {
        DatabaseUtils utils = new DatabaseUtils();

        InstanceQuery query = new InstanceQuery();

        query.setUsername(username);
        query.setPassword(password);
        query.setQuery(pSql);

        Instances data = query.retrieveInstances();
        data.setRelationName(pInstanceRelationName);

        if (data.classIndex() == -1)
        {
              data.setClassIndex(data.numAttributes() - 1);
        }
        return data;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Solution

  • I tried various approaches to my problem. But it seems that weka internal API does not allow solution to this problem right now. I modified weka.core.Instances append command line code for my purposes. This code is also given in this answer

    According to this, here is my solution. I created a SampleWithKnownHeader.arff file , which contains correct header values. I read this file with following code.

    public static Instances getSampleInstances() {
        Instances data = null;
        try {
            BufferedReader reader = new BufferedReader(new FileReader(
                    "datas\\SampleWithKnownHeader.arff"));
            data = new Instances(reader);
            reader.close();
            // setting class attribute
            data.setClassIndex(data.numAttributes() - 1);
        }
        catch (Exception e) {
            throw new RuntimeException(e);
        } 
        return data;
    
    }
    

    After that , I use following code to create instances. I had to use StringBuilder and string values of instance, then I save corresponding string to file.

    public static void main(String[] args) {
    
        Instances SampleInstance = MyUtilsForWeka.getSampleInstances();
    
        DataSource source1 = new DataSource(SampleInstance);
    
        Instances data2 = InstancesFromDatabase
                .getInstanceDataFromDatabase(DatabaseQueries.WEKALIST_QUESTION1);
    
        MyUtilsForWeka.saveInstancesToFile(data2, "fromDatabase.arff");
    
        DataSource source2 = new DataSource(data2);
    
        Instances structure1;
        Instances structure2;
        StringBuilder sb = new StringBuilder();
        try {
            structure1 = source1.getStructure();
            sb.append(structure1);
            structure2 = source2.getStructure();
            while (source2.hasMoreElements(structure2)) {
                String elementAsString = source2.nextElement(structure2)
                        .toString();
                sb.append(elementAsString);
                sb.append("\n");
    
            }
    
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    
        MyUtilsForWeka.saveInstancesToFile(sb.toString(), "combined.arff");
    
    }
    

    My save instances to file code is as below.

    public static void saveInstancesToFile(String contents,String filename) {
    
         FileWriter fstream;
        try {
            fstream = new FileWriter(filename);
          BufferedWriter out = new BufferedWriter(fstream);
          out.write(contents);
          out.close();
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    

    This solves my problem but I wonder if more elegant solution exists.