Search code examples
javaregexdatenumberswords

Split Persian Date Numbers Form Words in Java


I want to split Persian Date number from stick words in java. My string is like : "۰۱/۰۷/۱۳۹۵سعید"

I search too much, But I cant find appropriate one, that works for me!! In addition Date format might completely Wrong, its important to separate word from numbers.

I want to reach some thing Like "۰۱/۰۷/۱۳۹۵ سعید"


Solution

  • Here is my solution. It adds spaces to the String as you requested. In my main method, I give سعید۰۱/۰۷/۱۳۹۵سعید as input and get سعید ۰۱/۰۷/۱۳۹۵ سعید printed on the console.

    public class StringPadder {
    
        private static final String BETWEEN_NUMBER_AND_LETTER = "(?<=\\p{IsDigit})(?=\\p{IsAlphabetic})";
        private static final String BETWEEN_LETTER_AND_NUMBER = "(?<=\\p{IsAlphabetic})(?=\\p{IsDigit})";
    
        public static String addSpaces(String toPad) {
            return toPad.replaceAll(BETWEEN_NUMBER_AND_LETTER, " ").replaceAll(BETWEEN_LETTER_AND_NUMBER, " ");
        }
    
        public static void main(String[] args) {
            String toTest =  "سعید۰۱/۰۷/۱۳۹۵سعید";
            System.out.println(addSpaces(toTest));
    
        }
    }
    

    This works by some regular expression tricks.

    • The expression \p{IsDigit} matches a digit in any alphabet; so not just 0-9, but also Arabic/Persian numbers, Devanagari numbers and so on.
    • The expression \p{IsAlphabetic} matches a letter in any alphabet; so not just A-Z and a-z but also the Arabic/Persian alphabet and other alphabets.
    • When you see (?<=X) in a regular expression, it means that the match you're looking for must be preceded by something that matches X, but the match for X won't be part of the match that you find. This is called a "lookbehind", because it says "look behind what you're matching, and see if it's X".
    • When you see (?=X) in a regular expression, it means that the match you're looking for must be followed by something that matches X, but the match for X won't be part of the match that you find. This is called a "lookahead", because it says "look ahead of what you're matching, and see if it's X".

    Putting all that together, I've included two regular expressions in the code, namely BETWEEN_NUMBER_AND_LETTER and BETWEEN_LETTER_AND_NUMBER. Each of these matches nothing at all, because they contain no characters that actually match anything. But each one has a lookbehind and a lookahead. So BETWEEN_NUMBER_AND_LETTER matches "nothing at all" with a number before it and a letter after it; and BETWEEN_LETTER_AND_NUMBER matches "nothing at all" with a letter before it and a number after it.

    What you need to do is to replace either of those nothings with a space, because that will separate any letter from any number, provided they were consecutive characters in the original String. That's what my addSpaces method does - it first puts a space at any point in the String where there was a number immediately followed by a letter, then it puts a space at any point where there was a letter immediately followed by a number.

    My test case, in main demonstrates that this is what you required.