Search code examples
javaandroidsortingalphabetical

Alphabetical sorting pretty slow


Here is my problem : I have a list of custom object which contain a String called label. This list is big but too big, around 1000 objects. I would like to do an alphabetical sorting using label.

The thing is, some label contains character like for example É, (, e or E as first character. So I had to use the function deAccent() found here to sort it independently of the accent or other thing like that. With the use of this function the list ['Gab','eaaa','Éaa'] is sorted like that ['eaaa','Éaa','Gab'] instead of ['eaaa','Gab','Éaa']. Because when we use the compareTo method, É is after G. Here is what I have :

private List<Formula> sortFormulaList(List<Formula> formulaList) {
    // Sort all label alphabetically
    if (formulaList.size() > 0) {
        Collections.sort(formulaList, (formula1, formula2) ->
                deAccent(formula1.getLabel()).toLowerCase().compareTo(deAccent(formula2.getLabel().toLowerCase())));
    }
    return formulaList;
}

private String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

If I don't use the deAccent() it is fast enough for my purpose but when I use it take like 1 to 3 seconds to being sort.

Any idea on how I could make such a sort ? Or make this one faster


Solution

  • Consider @Henry's excellent suggestion and Formula might look like this:

    public class Formula {
        private final String label;
        private final String deAccentedLabel;
    
        public Formula(String label) {
            this.label = label;
            this.deAccentedLabel = deAccent(label);
        }
    
        public String getLabel() {
            return label;
        }
    
        public String getDeAccentedLabel() {
            return comparableLabel;
        }
    
    
        private String deAccent(String str) {
            String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
            Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
            return pattern.matcher(nfdNormalizedString).replaceAll("");
        }
    
    }
    

    Then it can be used like this:

    Collections.sort(formulaList, (formula1, formula2) -> formula1.getDeAccentedLabel().toLowerCase().compareTo(formula2.getDeAccentedLabel().toLowerCase());
    

    However, this exposes deAccentedLabel by adding the public getDeAccentedLabel() method.

    What I was suggesting in the comment is to hide deAccentedLabel to keep the public interface of Formula as clean as possible. So to sort, Formula provides the Comparator instead of other classes having to build it. Formula would look something like this:

    public class Formula {
        private final String label;
        private final String comparableLabel;
    
        public Formula(String label) {
            this.label = label;
            this.comparableLabel = deAccent(label).toLowerCase();
        }
    
        public String getLabel() {
            return label;
        }
    
        private String deAccent(String str) {
            String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
            Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
            return pattern.matcher(nfdNormalizedString).replaceAll("");
        }
    
        public static Comparator<Formula> getLabelComparator() {
            return (formula1, formula2) -> formula1.comparableLabel.compareTo(formula2.comparableLabel);
        }
    
    }
    

    and used like this:

    Collections.sort(formulaList, Formula.getLabelComparator());