Search code examples
javatextweka

How can I extract those data below and display in another file


L=40
i:Classifier_name=meka.classifiers.multilabel.BR
i:Classifier_ops=[-W, weka.classifiers.rules.ZeroR]
i:Classifier_info=
i:Dataset_name=PlainAbstractsBehavioralDomainLabels
i:Type=ML
i:Threshold=0.2289156626506024
i:Verbosity=1
v:N_train=247.0
v:N_test=3.0
v:LCard_train=1.8461538461538463
v:LCard_test=0.0
v:Build_time=2.79
v:Test_time=0.005
v:Total_time=2.795
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]

I got this result in text file from machine learning and how can I only display those data in another text file or any files:

1. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
2. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
3. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036

Solution

  • I'm not an expert in RegEx, so it would be nice, if experts will correct my pattern, but the following code works fine at least:

    FileInputStream fileInputStream = new FileInputStream("data.txt");
    File outputFile = new File("out.txt");
    PrintWriter writer = new PrintWriter(outputFile, "UTF-8");
    BufferedReader bf = new BufferedReader(new InputStreamReader(fileInputStream));
    int count = 1;
    String out;
    Pattern pattern = Pattern.compile(":\\[((\\d\\.\\d+(,\\s)?){0,4})\\]$");
    
    while ((out = bf.readLine()) != null){
        Matcher matcher = pattern.matcher(out);
        if (matcher.find()){
            String capture = count + ". " + matcher.group(1);
            writer.println(capture);
            System.out.println(capture);
            count++;
        }
    }
    fileInputStream.close();
    writer.close();
    

    So even if I'll process the following lines with different length:

    [0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036, 0.55]
    [0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
    [0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
    

    The output will be:

    1. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036, 0.55
    2. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
    3. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036