I am trying to extract artist and title names. However it is a bit complicated. Here is the list;
nlist <- c(
"Lil' SlimLil' Slim feat. PxMxWxPxMxWx Where Your Ward At!!",
"I Like It (Mannie Fresh Style)I Like It (Mannie Fresh Style)Ms. Tee",
"Bella VistaBella Vista Mister Wong",
"Tom WareTom WareChina Town",
"Race 'N RhythmRace 'N Rhythm Teenage Girls",
"Ronald MarquisseRonald MarquisseElectro Link 7",
"PleasurePleasure Thoughts Of Old Flames",
"OM, OM, Dom Um RomaoDom Um Romao Chipero",
"HookfaceHookface4 07 181221"
)
Here is the pattern in the strings.
Description:
1 and 8 is very hard and I couldn't solve. But for 2 to 7 below codes solve my problem.
title = str_trim(gsub('(.+?)\\1','', nlist))
artist = re.match('(.+?)\\1', nlist)[,2]
data = cbind(title,artist);data
And here the outputs of the above codes.
title artist
[1,] "feat. PxMxWxPxMxWx Where Your Ward At!!" "Lil' Slim"
[2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
[3,] "Mister Wong" "Bella Vista"
[4,] "China Town" "Tom Ware"
[5,] "Teenage Girls" "Race 'N Rhythm"
[6,] "Electro Link 7" "Ronald Marquisse"
[7,] "Thoughts Of Old Flames" "Pleasure"
[8,] "Chipero" "OM, "
[9,] "4 07 181221" "Hookeface"
Problem: When there is "feat." or "," in the string that cuts the repeated sequence of the string. Question: How can I extract truly the artist names like in below?
My expected result is here (Check 1 and 8);
title artist
[1,] "Where Your Ward At!!" "Lil' Slim feat. PxMxWx"
[2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
[3,] "Mister Wong" "Bella Vista"
[4,] "China Town" "Tom Ware"
[5,] "Teenage Girls" "Race 'N Rhythm"
[6,] "Electro Link 7" "Ronald Marquisse"
[7,] "Thoughts Of Old Flames" "Pleasure"
[8,] "Chipero" "OM, Dom Um Romao"
[9,] "4 07 181221" "Hookeface"
Thanks...
Maybe the following extracts what you want. I remove everything and the last repetition and store it in title
. To get the artist I remove the length form the previously found title
using substr
and then remove the repetitions of the artist using gsub
with (.{2,})\\1
, but this will also remove repetitions in the conjunction .
title <- sub(".*(.{2,})\\1\\s*", "", nlist)
artist <- trimws(gsub("(.{2,})\\1", "\\1"
, substr(nlist, 1, nchar(nlist) - nchar(title)), perl=TRUE))
cbind(title,artist)
# title artist
# [1,] "Where Your Ward At!!" "Lil' Slim feat. PxMxWx"
# [2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
# [3,] "Mister Wong" "Bella Vista"
# [4,] "China Town" "Tom Ware"
# [5,] "Teenage Girls" "Race 'N Rhythm"
# [6,] "Electro Link 7" "Ronald Marquisse"
# [7,] "Thoughts Of Old Flames" "Pleasure"
# [8,] "Chipero" "OM, Dom Um Romao"
# [9,] "4 07 181221" "Hookface"
Another way might be:
x <- sub("^(.*)\\1\\s*", "", nlist) #Remove the first repetition of artist
title <- sub(".*?(.{2,})\\1\\s*", "", x) #Remove Conjunction and repetition of Artist if there is one
artist <- trimws(gsub("(.{2,})\\1", "\\1"
, substr(nlist, 1, nchar(nlist) - nchar(title)), perl=TRUE))
cbind(title,artist)
# title artist
# [1,] "Where Your Ward At!!" "Lil' Slim feat. PxMxWx"
# [2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
# [3,] "Mister Wong" "Bella Vista"
# [4,] "China Town" "Tom Ware"
# [5,] "Teenage Girls" "Race 'N Rhythm"
# [6,] "Electro Link 7" "Ronald Marquisse"
# [7,] "Thoughts Of Old Flames" "Pleasure"
# [8,] "Chipero" "OM, Dom Um Romao"
# [9,] "4 07 181221" "Hookface"