Search code examples
javafileinputstreamstop-words

my code runs for 15mins but only white output?


I'm doing a stop word code for data cleaning. I followed a tutorial in YouTube: https://www.youtube.com/watch?v=ckQUlI7x7hI his code works and shows output but mine doesn't

I'm using english stop words, example of my stop words are "a", "an", "away", "keeps". the input will be "An apple a day keeps the doctor away" output should be "apple day the doctor".

this is the content of my file: https://ufile.io/gikev

Here is the code:

import java.io.FileInputStream;
import java.util.ArrayList;

public class DataCleaning {


public static void main(String[] args) {

    ArrayList sw = new ArrayList<>();

    try{
        FileInputStream x = new FileInputStream("/Users/Dan/Desktop/DATA/stopwords.txt");

        byte b[] = new byte[x.available()];
        x.read(b);
            x.close();

            String data[] = new String(b).split("\n");

        for(int i = 0; i < data.length; i++)
        {
            sw.add(data[i].trim());
        }
         FileInputStream xx = new FileInputStream("/Users/Dan/Desktop/DATA/cleandata.txt");

        byte bb[] = new byte[xx.available()];
        xx.read(bb);
            xx.close();

            String dataa[] = new String(bb).split("\n");



            for(int i = 0; i < dataa.length; i++)

        {
            String file = "";
            String s[] = dataa[i].split("\\s");
            for(int j = 0; j < s.length; i++)
            {
                if(sw.contains(s[j].trim().toLowerCase()))
                {
                    file=file + s[j] + " ";
                }

            }
            System.out.println(file + "\n");
        }

    } catch(Exception a){
        a.printStackTrace();
    }

   }

 }

and when I run mine it only does this:

what should I do?


Solution

  • There are 3 issues with your code :

    1. You are incrementing the wrong variable in the innermost loop thus
      resulting in an infinite loop as j will always be lesser that
      s.length and you are never incrementing j. Change this line :

      for (int j = 0; j < s.length; i++) {
      

      to

      for (int j = 0; j < s.length; j++) {
      
    2. To print words that are not stopwords you need to negate your if condition as follows :

      if (!sw.contains(s[j].trim().toLowerCase()))
      
    3. Also, make sure the file stopwords.txt is separated by \n(new line) because you are splitting it based on that and not like the file in the link shared by you.

    I recommend you to indent your code and also use meaningful names to name your variables. Debugging issues like this will be much simpler.