Search code examples
javainternationalizationlocalealphanumeric

How to get all national characters for selected Locale?


In my app I need to generate passwords based on all available national characters, like:

private String generatePassword(String charSet, int passwordLength) {
    char[] symbols=charSet.toCharArray();
    StringBuilder sbPassword=new StringBuilder();
    Random wheel = new Random();

    for (int i = 0; i < passwordLength; i++) {
       int random = wheel.nextInt(symbols.length);
       sbPassword.append(symbols[random]);
    }
    return sbPassword.toString();
}

For Latin we have smth like:

charSet="AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";

How to get similar String containing all national characters (alphabet) let's say for Thai, Arab or Hebrew?

I mean, all we know that Unicode contains all national characters available for any Locale, so there has to be a way to get them, otherwise I'd be forced to hardcode national alphabets - which is ugly... (in my case my app supports more than 10 locales)


Solution

  • Since you're using char[], you aren't going to be able to represent all Unicode code points in all scripts, since some of them will be outside the Basic Multilingual Plane and will not fit in a single char. Unfortunately, there is no easy way to get all the code points for a script without looping through them, like so:

    char[] charsForScript(Character.UnicodeScript script) {) {
      StringBuilder sb = new StringBuilder();
      for (int cp = 0; cp < Character.MAX_VALUE; ++cp) {
        if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
          sb.appendCodePoint(cp);
        }
      }
      return sb.toString().toCharArray();
    }
    

    This will return all the chars for a given script e.g., LATIN, GREEK, etc.

    To get all code points, even outside the BMP, you could use:

    int[] charsForScript(Character.UnicodeScript script) {) {
      List<Integer> ints = new ArrayList<>();
      for (int cp = 0; cp < Character.MAX_CODE_POINT; ++cp) {
        if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
          ints.add(cp);
        }
      }
      return ints.stream().mapToInt(i -> i).toArray();
    }