Search code examples
rdplyrextracttidyrseparator

Extract IDs from Text in many columns


I have a huge df which is separated by different symbols (#,space,,). After reading the data I have columns which aren't separated yet. I tried to use tidyr::separate to separate the columns. But as I have many columns the result of the separation overwrites the next column.

I tried to use ''tidyr::separate'' to separate the columns. But as I have many columns the result of the separation overwrites the next column.

the content of one column looks like this: Markt#ID=3002349#X=8612088#Y=50228881#RI=1|&RD=10062019&RT=6:46&L=1543&SM=0#1#2&CT=-1#-1#-1#-1#-1&P1=1111111111111111&O1=99999999&T1=99999999&F=-1&RF=1999991999999999111090999999&GP=FF=cost=-1|minDist=0|maxDist=2000|speed=100|routingType=1|ellipseRadius=1000|maxDur=120|waitTime=-1|linDistRouting=0|#BF=cost=-1|minDist=0|maxDist=2000|speed=100|routingType=1|ellipseRadius=1000|maxDur=120|waitTime=-1|linDistRouting=0|#&

All in all I have 12 columns from which I want to extract the IDs and the coordinates.

Expected results:

 ID       X         Y
 3002349  8612088  50228881

Solution

  • If all the rows have the same format you can use tidyr::extract

    tidyr::extract(df, v1, c("ID", "X", "Y"), regex = ".*ID=(\\d+)#X=(\\d+)#Y=(\\d+)")
    #       ID       X        Y
    #1 3002349 8612088 50228881
    

    This extracts numbers which are followed by ID, X and Y labels from the text.

    data

    x <- "Markt#ID=3002349#X=8612088#Y=50228881#RI=1|&RD=10062019&RT=6:46&L=1543&SM=0#1#2&CT=-1#-1#-1#-1#-1&P1=1111111111111111&O1=99999999&T1=99999999&F=-1&RF=1999991999999999111090999999&GP=FF=cost=-1|minDist=0|maxDist=2000|speed=100|routingType=1|ellipseRadius=1000|maxDur=120|waitTime=-1|linDistRouting=0|#BF=cost=-1|minDist=0|maxDist=2000|speed=100|routingType=1|ellipseRadius=1000|maxDur=120|waitTime=-1|linDistRouting=0|#&"
    df <- data.frame(v1 = x)