I have a Java assignment (taken from Programming Pearls) where I have to take input from a text file (using Scanner and FileReader objects), remove all punctuation and numbers and then create an ArrayList with each word sorted alphabetically in ascending order. Then, I have to printout each word followed by a number with the number of repetitions (occurences) of the word, and remove the duplicates.
My problem is that the replaceAll() method I am using does remove everything as expected, but with one unexpected consequence. The number in my text is converted into a blank token (word), which is then inserted into my ArrayList. When I inspect the ArrayList fields, it shows as "".
I've tried all sorts of different regular expressions, with the same result. Anyone has an idea why this is happening and how to avoid it?
Here's is the code excerpt: dictionary is the ArrayList object and inFile the Scanner object.
dictionary.add(inFile.next().toString().toLowerCase().replaceAll("[^a-zA-z]", "").trim());
Many thanks
Sorry, guys. My bad. yes, it should be like this a-zA-Z. But nothing changes. Here's the code and the output:
public class Dictionary
{
private List <String> dictionary;
private Scanner inFile;
public Dictionary()
{
this.dictionary = new ArrayList<String>();
}
public void parseText ()
{
inFile = new Scanner (new FileReader("C:\\Users\\User\\Desktop\\Ovid.txt"));
while (inFile.hasNext())
dictionary.add(inFile.next().toString().toLowerCase().replaceAll("[^a-zA-Z]", "").trim());
Collections.sort(dictionary);
}
And here's the output (from print statement):
dictiona1.parseText(); in the cool cool breeze of the evening the nightingale sang
sweet songs
As you can see, after "sang" there is a blank line. In the unparsed text there is a number there ("...sang 17 sweet songs"). The same is confirmed when I inspect the ArrayList.
Many thanks.
This is the expected behavior of your code. The replaceAll(regex, "")
you are using says that replace every occurring of non-a-z word with a empty String
"". And then add that empty String to the List
.
You can avoid this writing of empty String
while the time of adding List's values.
Break your while code(which you should always, as per best practices) and while adding values to List, put a check such that only non-null value will be inserted to List
.
String next = inFile.next().toLowerCase();
String replaced = next.replaceAll("[^a-zA-Z]", "").trim();
if(!replaced.equals("")){
dictionary.add(replaced);
}
I cannot make any guess for the newline you are getting after sang, until you provide the input String you are using.
Hope that helps