Search code examples
javaregexstringreplaceall

Strange behavior with Regex in Java


I want to filter a text, leaving only letters (a-z and A-Z). It seemed to be easy, following something like this How to filter a Java String to get only alphabet characters?

String cleanedText = text.toString().toLowerCase().replaceAll("[^a-zA-Z]", "");         
System.out.println(cleanedText);

The problem that the output of this is empty, unless I change the regex, adding another character, e.g. : --> [^:a-zA-Z]

I allready tried to check if it works with normal regex (not using the method ReplaceAll given by String object in Java), but I had exactly the same problem.

Any idea what could be the source of this strange behavior?

I had a txt file which I read using a BufferedReader. I add each line to one long string and apply the code I posted before to this. The whole code is as follows:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.lang.StringBuffer;
import java.util.regex.*;

public class Loader {

    public static void main(String[] args) {

        BufferedReader file = null;
        StringBuffer text = new StringBuffer();
        String str;

        try {
            file = new BufferedReader(new FileReader("text.txt"));
        } catch (FileNotFoundException ex) {
        }
        try

        {
            while ((str = file.readLine()) != null) {
                text.append(str);

            }

            String cleanedText = text.toString().toLowerCase().replaceAll("[^:a-z]", "");       
            System.out.println(cleanedText);
        } catch (IOException ex) {
        }
    }   
}

The text file is a normal article where I want to delete everything (including whitespaces) that is not a letter. An extract is as follows "[16]The Free Software Foundation (FSF), started in 1985, intended the word "free" to mean freedom to distribute"


Solution

  • In the end the problem was not with the regex nor with the program itself. It was just that eclipse does not show the output in console if it exceeds a certain length (but you can still work on it). To solve this simply check the fixed width console in Window -> Preferences -> Run/Debug -> Console as described in http://code2care.org/2015/how-to-word-wrap-eclipse-console-logs-width/

    Image of where to check fixed width console checkbox