Search code examples
pythonpandasdataframenlpdata-preprocessing

How to separate spesific number from text data on python


I have a dataframe from pandas :

id     adress

0     Jame Homie Street. N:60 5555242424 La
1     London. 2322325234243 Stw St. N 8 St.bridge
2     32424244234 ddd st. ss Sk. N 63 Manchester
3     Mou st 147 Rochester Liv 33424245223

I want to separate that is the numbers(like 5555242424 ,2322325234243 , 32424244234 ,33424245223 )and create a new feature.

Sample output :

id     adress                                           number

0     Jame Homie Street. N:60 La                      5555242424 
1     London. Stw St. N 8 St.bridge                   2322325234243 
2     ddd st. ss Sk. N 63 Manchester                  32424244234 
3     Mou st 147 Rochester Liv                        3424245223

Solution

  • Assuming you want to extract the first number that has at least 4 digits (so it ignores 60, 8, 63, 147 in your example), you can use:

    df_payers["number"] = df_payers["adress"].str.extract("(\d{4,})")
    df_payers["adress"] = df_payers["adress"].str.replace("(\d{4,})","",regex=True)
    
    >>> df_payers
       id                           adress         number
    0   0      Jame Homie Street. N:60  La     5555242424
    1   1   London.  Stw St. N 8 St.bridge  2322325234243
    2   2   ddd st. ss Sk. N 63 Manchester    32424244234
    3   3        Mou st 147 Rochester Liv     33424245223