Search code examples
pythonpandascjkpython-module-unicodedata

Convert Full width numbers into Normal numbers in python


I have a data in an excel file(only 1 column) where there are several japanese characters followed by fullwidth numbers. I want to convert these numbers into normal numbers.

いつもありがとう890ございます
忙しい7ー10ー1ところ

These are several rows like these.

What can I do so these rows could look like this:

いつもありがとう890ございます
忙しい7ー10ー1ところ

I tried doing this but I am not sure if this is how it should be done like

s = unicodedata.normalize('NFKC', df.to_string())

Solution

  • Assuming such an example, in which col1 is the column to process:

    df = pd.DataFrame({'col1': ['いつもありがとう890ございます 忙しい7ー10ー1ところ',
                                'いつもありがとう890ございます 忙しい7ー10ー1ところ'],
                       'col2': [1, 2]
                      })
    

    You can use apply:

    import unicodedata
    from functools import partial
    
    df['col1'] = df['col1'].apply(partial(unicodedata.normalize, 'NFKC'))
    

    Variant:

    df['col1'] = df['col1'].apply(lambda s: unicodedata.normalize('NFKC', s))
    

    Output:

                                col1  col2
    0  いつもありがとう890ございます 忙しい7ー10ー1ところ     1
    1  いつもありがとう890ございます 忙しい7ー10ー1ところ     2