Search code examples
routputstrsplit

After Strsplit, the output is not in the format expected


My input file called "locaddr" has the following records:

"Shelbourne Road, Dublin, Ireland"                                     
"1 Hatch Street Upper, Dublin, Ireland"                               
"98 Haddington Road, Dublin, Ireland"       
"11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland"
"Winterstraße 17, 69190 Walldorf, Germany"

I applied STRSPLIT function in R to this file using the following code:

*testmat <- strsplit(locaddr,split=",")
outmat <- matrix(unlist(testmat), nrow=nrow(locaddr), ncol=3, byrow=T)*

The final output I get is :

Street                        City                    Country          
 [1,] "Shelbourne Road"             " Dublin"               " Ireland"       
 [2,] "1 Hatch Street Upper"        " Dublin"               " Ireland"       
 [3,] "98 Haddington Road"          " Dublin"               " Ireland"       
 [4,] "11 Mount Argus Close"        " Harold's Cross"       " Dublin 6W"     
 [5,] " Co. Dublin"                 " Ireland"              "Winterstraße 17"
 [6,] " 69190 Walldorf"             " Germany"              "Caughley Road"  
 [7,] " Broseley"                   " Shropshire TF12 5AT"  " UK"            
 [8,] "Pappelweg 30"                " 48499 Salzbergen"     " Germany"       
 [9,] "60 Grand Canal Street Upper" " Dublin 4"             " Ireland"       
[10,] "Wieslocher Straße"           " 68789 Sankt Leon-Rot" " Germany"

As is obvious from above, the required output was the final three terms in each record. But instead I have a mix of nearly everything in there.

My requirement is though the addresses are all of variable length, after STRSPLIT, I need to pick only the last three terms and put them in as Street, City Country.

Your help and time are most appreciated.


Solution

  • Next time please provide your question with some handy reproducible code.

    Following is the code of how I would try solving this problem.

    x <- c("Shelbourne Road, Dublin, Ireland",                                     
           "1 Hatch Street Upper, Dublin, Ireland",                               
           "98 Haddington Road, Dublin, Ireland",      
           "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
           "Winterstraße 17, 69190 Walldorf, Germany")
    
    # split on ,
    splitx <- strsplit(x, ",")
    
    # for every list element (lapply climbs the list element-wise)
    # subset last 3 elements
    last3 <- lapply(splitx, tail, n = 3)
    
    # merge them together by row
    do.call("rbind", last3)
    
         [,1]                   [,2]              [,3]      
    [1,] "Shelbourne Road"      " Dublin"         " Ireland"
    [2,] "1 Hatch Street Upper" " Dublin"         " Ireland"
    [3,] "98 Haddington Road"   " Dublin"         " Ireland"
    [4,] " Dublin 6W"           " Co. Dublin"     " Ireland"
    [5,] "Winterstraße 17"      " 69190 Walldorf" " Germany"