Search code examples
javatwittertwitter4jinput-filtering

How to reformat tweets, replacing single quotes with escaped quotes consistently


Currently, I have a method, which will be shown below, designed to remove all the single quotes and newline characters for tweets that were retrieved using the twitter API. The newline works every time, but for some reason, despite that, the replacing character only works half the time. This replacement method is called exactly one line before the write file, so I am doubtful that for some reason it misses iterations. The tweets seem to filter randomly. I can't explain why sometimes it works. Strange note. Removing the if statement with the x.replace("\", "\\'"); results in nothing being filtered.

Thanks in advance.

public static String replace(String x) {
    String replaced = x;


        if (x.contains("'")) {
            replaced = x.replaceAll("'", "\\\\'");
        }
        if(x.contains("\n") || x.contains("\r")){
            replaced = x.replaceAll("\\r\\n|\\r|\\n", " ");
        }

        System.out.println(replaced);

    return replaced;
}

Edit: Looking into it, the if statement activates, but a small minority of the time, some tweets simple go to the replaceAll line and don't get replaced. Why not? I have no clue.

Sample Data: https://justpaste.it/15c6t First screw up is "You're" line 20.


Solution

  • It seems that there were cases where the first replace method was being interfered with the second replace method. When separating the two into two different methods (while awkward), it functioned as it should have.

    public static String replace(String x) { //Cleans the single quotes
        String replaced = x;
            if (replaced.contains("'")) {
                replaced = x.replaceAll("'", "\\\\'");             
            }
        return replaced;
    }
    
    public static String removeEnters(String x){ //Removes any enters
        String replaced = x;
        if(replaced.contains("\n") || x.contains("\r")){
                replaced = x.replaceAll("\\r\\n|\\r|\\n", " ");
        }
        return replaced;
    }