How would I go about extracting, for each row (there are ~56,000 records in an Excel file) in a specific column, only part of a string? I need to keep all text to the left of the last '/' forward slash. The challenge is that not all cells have the same number of '/'. There is always a filename (*.wav) at the end of the last '/', but the number of characters in the filename is not always the same (sometimes 5 and sometimes 6).
Below are some examples of the strings in the cells:
cloch/51.wav
grand/Grand_bombarde/02-suchy_Grand_bombarde/038-D.wav
grand/Grand_bombarde/02-suchy_Grand_bombarde/039-D#.wav
AB_AeolinaL/025-C#.wav
AB_AeolinaL/026-D.wav
AB_violadamourL/rel99999/091-G.wav
AB_violadamourL/rel99999/092-G#.wav
AB_violadamourR/024-C.wav
AB_violadamourR/025-C#.wav
The extracted text should be:
cloch
grand/Grand_bombarde/02-suchy_Grand_bombarde
grand/Grand_bombarde/02-suchy_Grand_bombarde
AB_AeolinaL
AB_AeolinaL
AB_violadamourL/rel99999
AB_violadamourL/rel99999
AB_violadamourR
AB_violadamourR
Can anyone recommend a strategy using R?
You can use the stringr
package str_remove(string,pattern)
function like:
str = "grand/Grand_bombarde/02-suchy_Grand_bombarde/038-D.wav"
str_remove(str,"/[0-9]+[-]*[A-Z]*[#]*[.][a-z]+")
Output:
> str_remove(str,"/[0-9]+[-]*[A-Z]*[#]*[.][a-z]+")
[1] "grand/Grand_bombarde/02-suchy_Grand_bombarde"
Then you can just iterate over all other strings:
strings <- c("cloch/51.wav",
"grand/Grand_bombarde/02-suchy_Grand_bombarde/038-D.wav",
"grand/Grand_bombarde/02-suchy_Grand_bombarde/039-D#.wav",
"AB_AeolinaL/025-C#.wav",
"AB_AeolinaL/026-D.wav",
"AB_violadamourL/rel99999/091-G.wav",
"AB_violadamourL/rel99999/092-G#.wav",
"AB_violadamourR/024-C.wav",
"AB_violadamourR/025-C#.wav")
str_remove(strings,"/[0-9]+[-]*[A-Z]*[#]*[.][a-z]+")
Output:
> str_remove(strings,"/[0-9]+[-]*[A-Z]*[#]*[.][a-z]+")
[1] "cloch"
[2] "grand/Grand_bombarde/02-suchy_Grand_bombarde"
[3] "grand/Grand_bombarde/02-suchy_Grand_bombarde"
[4] "AB_AeolinaL"
[5] "AB_AeolinaL"
[6] "AB_violadamourL/rel99999"
[7] "AB_violadamourL/rel99999"
[8] "AB_violadamourR"
[9] "AB_violadamourR"