Search code examples
javafiletextio

Read txt File, search for some words and replace them from Hashmap of other words, and preserve the puctuation and uppercase


I have a Hashmap of overused words as a key and their replacement as a value. this some values from the maps.

[ amazing:astonishing interesting:intriguing literally:frankly nice:pleasant hard:taxing change:transform ... ]

I have to implement a class that searches for overused words in a given text file and replaces them with better choices. OLD text file :

" "Amazing" is really the best way I can think of to describe it. Literally, it is hard to express how much I liked it. It was amazingly NICE!!!!! Good, not bad. I wouldn't change a bit of it. Please, be nice and help me fix my writing!! b BB bbb Bb B."

NEW text file :

" "Astonishing" is really the best way I can think of to describe it. Frankly, it is taxing to express how much I liked it. It was amazingly PLEASANT!!!!! Superior, not inferior. I wouldn't transform a bit of it. Please, be pleasant and help me fix my writing!! cat BB bbb Bb CAT "

  • TextImprover must preserve the punctuation of the input file.
  • Assume all words in the input file are either in all lower case, leading upper case, or all caps.

I have implemented the first function that reads a txt file and makes a map of overused words :

public class TextImprover {

    private HashMap<String, String> wordMap ;

    /**
     * Constructor
     * 
     * @param wordMapFileName   name of the file containing the over-used words and their replacements
     */
    public TextImprover(String wordMapFileName) { 
        this.wordMap = new HashMap<String,String>();
        try {
        BufferedReader br = new BufferedReader(new FileReader(wordMapFileName));
        String line ;
        while((line = br.readLine())!= null) {
            String[] wordLine = line.split("\t");
            //System.out.println(wordLine[1]);
            String overUsedWord = wordLine[0].trim();
            String replaceWord = wordLine[1].trim();
            
            wordMap.put(overUsedWord, replaceWord);
        }
        br.close();
            
        }catch(FileNotFoundException e){
            System.out.println("File: "+ wordMapFileName + " not found");   
        }catch (IOException e1) {
            System.out.println(e1.getMessage());
        }
    }

I need this Function :

/**
     * Replaces all of the over-used words in the given file with better words, based on the word map
     * used to create this TextImprover
     * 
     * @param fileName  name of the file containing the text to be improved
     */
    public void improveText(String fileName) {
        try {
            BufferedReader br = new BufferedReader(new FileReader(fileName));
            String line ;
            while((line = br.readLine())!= null) {
                String[] lineWords = line.split(" ");
                // The code I'm strugling with 
            }
            br.close();
                
            }catch(FileNotFoundException e){
                System.out.println("File: "+ fileName + " not found");  
            }catch (IOException e1) {
                System.out.println(e1.getMessage());
            }

    }

Thank you for your help.


Solution

  • Instead of the split method, that also uses a regular expression for splitting, I would use the regular expression ([a-zA-Z]+) in the "usual" way to find the next word in your input. (The "usual" way is with a Pattern and a Matcher.)

    Then you would use the Matcher.replaceAll(Function<MatchResult,String> replacer) method where you get each match into the function and there you can fetch the replacement from the map and decide if you want to convert it to all upper case or title case (only the first character upper case).

    The equivalent of the code you posted (so with out the actual inner replacement stuff, but made easier there) would look like this:

    Pattern pattern = Pattern.compile("[a-zA-Z]+"); // best outside the while loop!
    
    // From here replaces your String[] lineWords = line.split(" "); inside the loop
    Matcher matcher = pattern.matcher(line);
    String result = matcher.replaceAll(match -> {
    
        String word = match.group();
        // TODO: find out if word is "ALL CAPS" or "Title Case"
        // TODO: get replacement from map - don't forget to convert the input to the map toLowerCase()
        String replacement = ...;
        return replacement
    });
    
    // here your result contains the whole line with all replacements.