I want to add spaces to each character in a textfile
in.txt
在吗??
嗯
你让我看的那款手提是不是11寸的,很小的?
看来还是美国的便宜啊
应该是吧
out.txt
在 吗 ? ?
嗯
你 让 我 看 的 那 款 手 提 是 不 是 1 1 寸 的 , 很 小 的 ?
看 来 还 是 美 国 的 便 宜 啊
应 该 是 吧
I've tried this (How to remove/add spaces in all textfiles?) but it outputs:
� � � � � � � � � � � �
� � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1 1 � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � �
How do I achieve out.txt
?
I've also tried:
$ perl -F'' -C -lane 'print join " ", @F' in.txt
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_PAPER = "de_DE.UTF-8",
LC_ADDRESS = "de_DE.UTF-8",
LC_MONETARY = "de_DE.UTF-8",
LC_NUMERIC = "de_DE.UTF-8",
LC_TELEPHONE = "de_DE.UTF-8",
LC_IDENTIFICATION = "de_DE.UTF-8",
LC_MEASUREMENT = "de_DE.UTF-8",
LC_TIME = "de_DE.UTF-8",
LC_NAME = "de_DE.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
� � � � � � � � � � � �
� � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1 1 � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � �
And
$ cat in.txt 在吗??
嗯
你让我看的那款手提是不是11寸的,很小的?
看来还是美国的便宜啊
应该是吧
$ sed 's/\s/g;s/./& /g' in.txt
sed: -e expression #1, char 10: unknown option to `s'
The seem to be something wrong with my locale:
$ locale
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=
To fix it, i had to do:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
And then:
$ perl -F'' -C -lane 'print join " ", @F' in.txt
在 吗 ? ?
嗯
你 让 我 看 的 那 款 手 提 是 不 是 1 1 寸 的 , 很 小 的 ?
看 来 还 是 美
Assuming you have a UTF-8 locale set up correctly, you can use this Perl one-liner:
perl -F'' -C -lane 'print join " ", @F' in.txt > out.txt
The -a
switch splits the input on the field separator, which has been set to an empty string, so each character is a separate element in the array @F
. Since this uses join
, there is no space added after the last character on the line (it's not clear whether there should be one or not).
Another option is to use a substitution:
perl -C -pe 's/(.)/$1 /g' in.txt > out.txt
This will add a space after every character, including the last one.