Search code examples
javagroupingsimilaritylevenshtein-distance

Classifying and grouping strings in array in Java


I need to find out the best way how to classify and group array of strings. Let's say I have array:

Resources[] = {tester1,tester2,solverC1,solverC2,solverS2,solverS1,tester3,tester4,system}

Actually it's much much more string but the idea it like that. I need to get the answer, that I have resources in this case:

Resource: tester || Quantity: 4
Resource: system || Quantity: 1
Resource: solver || Quantity: 4

Is it best idea to use Levenshtein Distance? If it's yes, maybe some has ideas to group the array and get names (without numbers or letters) and count the quantity of them?


Solution

  • You can use java streams to get a nice map with the values. For example:

    List<String>  res= Arrays.asList("tester1","tester2","solverC1","solverC2","solverS2","solverS1","tester1","tester4","system");
    
    Map<String,Long> result=res.stream().collect( Collectors.groupingBy(s-> s.replaceAll("\\d","").toString(),  Collectors.counting()));
    

    Will hold the values: system="1", tester="4", solverS="2", solverC="2"

    I have used the groupBy function that removes just the numbers from the string but you can define any rule you want here:

    Collectors.groupingBy(s-> s.replaceAll("\\d","").toString()
    

    Depending on if you want solverS to be different than solverC

    Working example:

    public class MainClass {
        public static void main(String[] args) {
            System.out.println(new Date()+": Let's start our StackOverflow helper project!");
    
                    List<String>  res= Arrays.asList("tester1","tester2","solverC1","solverC2","solverS2","solverS1","tester1","tester4","system");
    
    
                    Map<String,Long> reuslts=res.stream().collect( Collectors.groupingBy(s-> s.replaceAll("\\d","").toString(),  Collectors.counting()));
    
                     StringBuilder sb = new StringBuilder();
                    Iterator<Entry<String, Long>> iter = reuslts.entrySet().iterator();
                    while (iter.hasNext()) {
                        Entry<String, Long> entry = iter.next();
                        sb.append(entry.getKey());
                        sb.append('=').append('"');
                        sb.append(entry.getValue());
                        sb.append('"');
                        if (iter.hasNext()) {
                            sb.append(',').append(' ');
                        }
                    }
                    System.out.println(sb.toString());
    
        }   
    
    }
    
    // It should print system="1", tester="4", solverS="2", solverC="2"