I've a plain text file.
> Input: इंजेक्शन इंटरनॅशनल इंटिग्रेटेड इंटिरिअर इंडस्ट्री
All words are separated by one or more spaces. I want to collect all unique chars from the text file. I'm looking for a unix command; the order of the result chars is not important.
> Expected result: इं जे क्श न ट र नॅ श ल इ्रे टे ड टि रिअ र ड स्ट्री
With the command Klaus has provided
cat <file>|sed -e 's/\(.\)/\1\n/g'|sort -u|tr -d '\n'
Result comes as:
ं अ इ क ग ज ट ड न र ल श सिीॅे्
I don't want to separate horizontal or vertical conjuncts or dependent vowels from its base character.
I just want to separate complete characters in a word from each other.
Can we achieve this with UNIX commands?
"base character" + "dependent vowel" = "complete character"
- क ा का
- क ि कि
Klaus's command works for English text only. But, It doesn't work with indic languages such as Hindi.
Input: hi1 hello-2 how!3 "are4 ?you5
result: h i e l o w a r y u 1 2 3 4 5 - ! "
Note:- You have to install Indic support in your OS. Also, download Mangal font from http://hindi-fonts.com/fonts/Mangal
Try this:
or simplified ( stolen from fedorqui comment, thanks! Never seen &
before in the replacement part. Good to learn something new! )
sed 's/./&\n/g' <file> | sort -u | tr -d '\n'