I have a dataset like below and the telephone numbers are in different digits and formats.
Would you help me ordering them into a standard format using R?
TelephoneData <- data.frame(
FIRST = c("STAN", "FIONA", "JOHN", "VERA", "ROBERT", "ANGIE", "PAUL", "GEORGE", "JUDITH", "TREVOR", "KEN", "BRIAN", "GLADYS", "MARY", "MARY", "JOSHUA",
"BRIAN", "PHILLIP", "KATE", "BRIAN"),
PHONE = c("+44 1152 195298", "07366 602865", "01160 979447", "01597 501161", "01232 637283", "01296 230679", "(07183) 151418", "(07995) 376450",
"(0208) 0511522", "+44 208 3960687", "(01544) 668176", "(07540) 940315", "0208 4137611", "(01472) 119737", "(0208) 6494623",
"(01156) 145807", "07731 566115", "(0207) 7270589", "(0207) 7542812", "(01205) 835056")
)
This might be useful as well:
TelephoneData$TelNr <- gsub("\\+44", "0", gsub("[() ]", "", TelephoneData$PHONE)) #replace +44 by 0, remove spaces and brackets
TelephoneData$TelNr <- gsub("([0-9]{5})(.*)", "\\1 \\2", TelephoneData$TelNr) #insert space after every 5 chars
TelephoneData <- TelephoneData[order(TelephoneData$TelNr ),] #sort by the column TelNr
Giving the result
# FIRST PHONE TelNr
#1 STAN +44 1152 195298 01152 195298
#16 JOSHUA (01156) 145807 01156 145807
#3 JOHN 01160 979447 01160 979447
#20 BRIAN (01205) 835056 01205 835056
#5 ROBERT 01232 637283 01232 637283
#6 ANGIE 01296 230679 01296 230679
#14 MARY (01472) 119737 01472 119737
#11 KEN (01544) 668176 01544 668176
#4 VERA 01597 501161 01597 501161
#18 PHILLIP (0207) 7270589 02077 270589
#19 KATE (0207) 7542812 02077 542812
#9 JUDITH (0208) 0511522 02080 511522
#10 TREVOR +44 208 3960687 02083 960687
#13 GLADYS 0208 4137611 02084 137611
#15 MARY (0208) 6494623 02086 494623
#7 PAUL (07183) 151418 07183 151418
#2 FIONA 07366 602865 07366 602865
#12 BRIAN (07540) 940315 07540 940315
#17 BRIAN 07731 566115 07731 566115
#8 GEORGE (07995) 376450 07995 376450
Hope this helps!