Search code examples
javaregexquoting

Count of each words


To write a java program to count the total number of words in a given string, and to print the count of each word in alphabetical order.

Using Collections, I need to sort and print the words in a given string with its corresponding count in the alphabetical order with some restrictions.

Words which are enclosed within the double quotes (eg."wrapped") should be sorted and printed at last.

Whenever I sort the list, words which are enclosed with double quotes are sorted first (based on ACII Table), but I need to sort all non-quoted words before double quoted words..

Please help me to find a solution for this type of sorting..

import java.util.*;

public class UniqueWord {

    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
           String inp = sc.nextLine();
          inp = inp.toLowerCase().replaceAll("[^a-z'\" ]"," ");
          int count=0;
          char ch[] = new char[inp.length()];
           for(int i=0; i<inp.length(); i++){
               ch[i] = inp.charAt(i);
              if(((i>0)&&(ch[i]!=' ')&&(ch[i-1]==' '))||((ch[0]!=' ')&&(i==0))){
                  count++;
              }
           }
           System.out.println("Number of words "+count);
}

Input:

The implementation in a TreeSet is not synchronized in a sense that if multiple threads access a tree set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSortedSet method.

Expected output:

Number of words 64

Words with the count

a: 3

access: 1

accomplished: 1

and: 1

at: 1

be: 2

by: 1

collections: 1

concurrently: 1

encapsulates: 1

exists: 1

externally: 1

if: 2

implementation: 1

in: 2

is: 2

it: 1

least: 1

method: 1

modifies: 1

multiple: 1

must: 1

naturally: 1

no: 1

not: 1

object: 2

of: 1

on: 1

one: 1

sense: 1

set: 4

should: 1

some: 1

such: 1

synchronized: 2

synchronizedsortedset: 1

synchronizing: 1

that: 2

the: 6

this: 1

threads: 2

tree: 1

treeset: 1

typically: 1

using: 1

“wrapped”: 1

EDIT Got the solution....


Solution

  • Look at this code, update as required. Leverage Collections feature and avoid writing code if the features are provided. Try even improving on below code.

        String input = "This is a \"long\" statement.SortedSet Collections.";
    
        //split string based on your delimiters ( space, comma, dot )
        String[] split = input.split("[ ,.]");
        List<String> splitData = Arrays.asList(split);
    
        //create the data map with num occurances
        Map<String, Integer> dataToNumOccurances = new HashMap<>();
        for (String aString : splitData) {
            int occurrences = Collections.frequency(splitData, aString);
            dataToNumOccurances.put(aString, occurrences);
        }
    
        //convert to list so that it could be custom sorted
        List<String> sortedWords = new ArrayList<>(dataToNumOccurances.keySet());
        sortedWords.sort(new Comparator<String>()
        {
            @Override
            public int compare(String m1, String m2)
            {
                //apply the rule to push back double quoted string
                if (m1.startsWith("\"")) {
                    return m2.compareToIgnoreCase(m1);
                }
                //apply case in-sensitive sort
                return m1.compareToIgnoreCase(m2);
            }
        });
    
    
        for (String word : sortedWords) {
            System.out.println("Word: " + word + ", count: " + dataToNumOccurances.get(word));
        }