What I ideally want is data in this particular format. I've shown some below to get an idea
Country Code | Phone_number | Corrected Phone Number |
---|---|---|
GB | 0131 496 0902 | +441314960902 |
GB | 0141 4960760 | +441414960760 |
GB | +44117 4960589 | +441174960589 |
US | 191.040.0076x2520 | +11910400076x2520 |
DE | 05532 56994 | +49553256994 |
GB | 01514960233 | |
GB | (0118)4960720 | +441184960720 |
GB | +44(0)114 4960045 | |
DE | +49(0)1743095110 | |
GB | +44(0)1632 960 720 | |
GB | +44117 496 0381 |
I haven't filled out all the corrected numbers but what i was attempted was, first to use regex to take out any parenthesis and dots from all phone numbers then replacing the 0 with the country code which will be taken from a dictionary i.e
{
GB : +44
DE : +49
US : +1
}
This hadn't worked for me since matching the parenthesis pattern, caused some issues.
The logic may change with rules of phone number format; but a logic that fits your requirement is as follows:
+
and letters used for extension e.g. x
, ext
(following keeps all alphabets).0
that follows ISD code.df = pd.DataFrame(data=[["GB","0131 496 0902"],["GB","0141 4960760"],["GB","+44117 4960589"],["US","191.040.0076x2520"],["DE","05532 56994"],["GB","01514960233"],["GB","(0118)4960720"],["GB","+44(0)114 4960045"],["DE","+49(0)1743095110"],["GB","+44(0)1632 960 720"],["GB","+44117 496 0381"]], columns=["Country Code","Phone_number"])
isd_code_map = { "GB": "+44", "DE": "+49", "US": "+1" }
def correct_phone_number(row):
import re
# Remove special chars other than digits, `+` and letters used for extension e.g. `x`, `ext` (following keeps all alphabets).
result = re.sub("[^A-Za-z\d\+]", "", row["Phone_number"])
# Prefix ISD code by matching country code.
if not result.startswith(isd_code_map[row["Country Code"]]):
result = isd_code_map[row["Country Code"]] + result
# Remove `0` that follows ISD code.
if result.startswith(isd_code_map[row["Country Code"]] + "0"):
result = result.replace(isd_code_map[row["Country Code"]] + "0", isd_code_map[row["Country Code"]])
return result
df["Corrected Phone Number"] = df.apply(correct_phone_number, axis=1)
print(df)
Output:
Country Code Phone_number Corrected Phone Number
0 GB 0131 496 0902 +441314960902
1 GB 0141 4960760 +441414960760
2 GB +44117 4960589 +441174960589
3 US 191.040.0076x2520 +11910400076x2520
4 DE 05532 56994 +49553256994
5 GB 01514960233 +441514960233
6 GB (0118)4960720 +441184960720
7 GB +44(0)114 4960045 +441144960045
8 DE +49(0)1743095110 +491743095110
9 GB +44(0)1632 960 720 +441632960720
10 GB +44117 496 0381 +441174960381