I tried using the basic regex for unicodes but I am not able to make them work on the string with characters other than the traditional A-Z and numbers
I am looking at examples from multiple languages not part of the A-Z Alphabetical family
text = "20किटल"
res = re.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)
Output:
20किटल
2nd try:
regexp1 = re.compile('^[^\W\d_]+$', re.IGNORECASE | re.UNICODE)
regexp1.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)
Output:
20किटल
Expected output:
**20 किटल**
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import regex
text = "20किटल"
pat = regex.compile(r"(?<=\d)(?=\p{L})", re.UNICODE)
res = pat.sub(" ", text)
print res
Where \p{L}
stand for any letter in any language
Output:
20 किटल