python pandas cjk python-module-unicodedata

Convert Full width numbers into Normal numbers in python

I have a data in an excel file(only 1 column) where there are several japanese characters followed by fullwidth numbers. I want to convert these numbers into normal numbers.

いつもありがとう８９０ございます
忙しい７ー１０ー１ところ

These are several rows like these.

What can I do so these rows could look like this:

いつもありがとう890ございます
忙しい7ー10ー1ところ

I tried doing this but I am not sure if this is how it should be done like

s = unicodedata.normalize('NFKC', df.to_string())

Solution

Assuming such an example, in which col1 is the column to process:

df = pd.DataFrame({'col1': ['いつもありがとう８９０ございます 忙しい７ー１０ー１ところ',
                            'いつもありがとう８９０ございます 忙しい７ー１０ー１ところ'],
                   'col2': [1, 2]
                  })

You can use apply:

import unicodedata
from functools import partial

df['col1'] = df['col1'].apply(partial(unicodedata.normalize, 'NFKC'))

Variant:

df['col1'] = df['col1'].apply(lambda s: unicodedata.normalize('NFKC', s))

Output:

                            col1  col2
0  いつもありがとう890ございます 忙しい7ー10ー1ところ     1
1  いつもありがとう890ございます 忙しい7ー10ー1ところ     2