I have a column that has phone numbers. They are usually formatted in (555) 123-4567
but sometimes they are in a different format or they are not proper numbers. I am trying to convert this field to have just the numbers, removing any non-numeric characters (if there are 10 numbers).
How can I apply a function that says if there are 10 numbers in this field, extract just the numbers?
I tried to use:
df['PHONE'] = df['PHONE'].str.extract('(\d+)', expand=False)
But this just extracts the first chunk of numbers (the area code). How do I pull all the numbers and only run this extraction if there are exactly 10 numbers in the field?
My expected output would be 5551234567
Figured it out. I created a function that I apply to my phone # field
def extractNums(number):
new_number = list(filter(str.isnumeric, number))
if len(new_number) == 10:
return "".join(new_number)
else:
return number
df['PHONE'] = df['PHONE'].apply(extractNums)