Search code examples
regexrvectorstrsplit

R - Splitting character vector so that every unique element is added to a new character vector


I have a character vector where single elements contain multiple strings separated by commas. I have obtained this list by extracting it from a data frame, and it looks like this:

 [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"                                                                              
 [2] "Ferncroft, Passaconaway, Paugus Mill"                                                                                                   
 [3] "Alexandria, South Alexandria"                                                                                                           
 [4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"                                                                                   
 [5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"                                                                 
 [6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
 [7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"                                                                                        
 [8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"                                                        
 [9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"                                                                    
[10] "Ashland" 

I would like to obtain a new character vector whereby every single string is an element within this character vector, i.e.:

 [1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"                                                                              
 [6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"

I used the strsplit() function, however this returns a list. When I try to turn it into a character vector, it reverts to the old state.

I'm sure this is a really simple problem - any help would be greatly appreciated! thanks!


Solution

  • Your post title suggests you want unique strings, so

    unique(unlist(strsplit(myvec, split=",")))
    

    or

    unique(unlist(strsplit(myvec, split=", ")))
    

    if you always have a space following the comma.