I am having some problems with this Java task.
I have two files — hello.txt and stopwords.txt. I am just trying to remove the words that are in the stopwords.txt file in the hello.txt file and have the frequency of the top n elements in the updated hello file displayed in the console.
I know how to do this in python, but not in java. I believe a hash map would be the best approach for this.
Thank you very much!
I have attempted to use this code, but I am not getting any output:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.*;
public class practice {
public static void main(String[] args) throws IOException {
ArrayList stopword = new ArrayList<>();
try {
FileInputStream fis = new FileInputStream("stopwords.txt");
byte b[] = new byte[fis.available()];
fis.read(b);
fis.close();
String data[] = new String(b).trim().split("\n");
for (int i = 0; i < data.length; i++) {
stopword.add(data[i].trim());
}
FileInputStream fis2 = new FileInputStream("hello.txt");
byte b1[] = new byte[fis2.available()];
fis2.read(b);
fis2.close();
String data1[] = new String(b1).trim().split("\n");
// String myFile="";
for(int i = 0; i < data1.length; i++) {
String myFile = "";
String s2[] = data[i].split("/s");
for (int j = 0; j < s2.length; j++) {
if (!(stopword.contains(s2[j].trim().toLowerCase()))) {
myFile = myFile+s2[j]+" ";
}
}
System.out.println(myFile+"\n");
}
} catch (Exception e) {
e.printStackTrace();
}
File file = new File("hello.txt");
try (Scanner sc = new Scanner(new FileInputStream(file))) {
int count=0;
while(sc.hasNext()){
sc.next();
count++;
}
System.out.println("Number of words for new file: " + count);
}
}
}
Given a file hello.txt containing remove leave remove leave remove leave re move remov e leave remove hello remove world!
And a file stopWords.txt containing remove world
Using the Files
class, I can read the entire contents of the file and save it into a (normalized) string. Then, I can use replaceAll()
from String class to replace a stopWord from the file. My example doesn't save the new String back to the file, but this can be easily done by adding the following lines:
byte[] strToBytes = helloTxt.getBytes();
Files.write(Paths.get("hello.txt"), strToBytes);
The code to read the file and replace all found stop words:
public class RemoveWords {
public static void main (String[] args) {
try {
// per @markspace's comment
String helloTxt = Files.readString(Paths.get("hello.txt"), Charset.defaultCharset());
String stopWordsTxt = Files.readString(Paths.get("stopwords.txt"), Charset.defaultCharset());
String[] stopWords = stopWordsTxt.split("\\s");
for (String stopWord : stopWords) {
helloTxt = helloTxt.replaceAll(stopWord, "");
}
System.out.println(helloTxt);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Outputs
leave leave leave re move remov e leave hello !
To calculate the frequency of words, you may want to check out this solution I came up with for another use case.