Let's say I have a dataframe
Author | Lyrics |
Name1 Text (characters)
Name2 Text (characters)
I want to create another column through applying a function that for each row takes the Text from the Text column, separates by whitespaces, then iterates over each token to see if it is within another vector I made (so I can work out the percentage of tokens within the text that are within that other vector).
The function as I have written so far is below
ReturnPercentPosWord = function(textLyrics){
WhitespaceSplitText = strsplit(textLyrics, " ")
LengthSplitText = length(WhitespaceSplitText)
CountInPosList = 0
for (i in WhitespaceSplitText) {
if (i %in% PositiveWords$word) {
CountInPosList = CountInPosList+1
}
}
if (CountInPosList == 0) {
return(0)
}
PercentInPos = (CountInPosList/LengthSplitText)*100
return(PercentInPos)}
I want to apply this function to each row now. I have tried
TestPOSwordsDF$PercentPositiveWords = ReturnPercentPosWord(TestPOSwordsDF$Lyrics)
and
TestPOSwordsDF$PercentPositiveWords = apply(TestPOSwordsDF[, c('Lyrics'),drop=F], 1, ReturnPercentPosWord)
But I get a message saying
the condition has length > 1 and only the first element will be used
I would really appreciate any help with this. Thank you!
Try using this :
TestPOSwordsDF$PercentPositiveWords <- sapply(
strsplit(TestPOSwordsDF$Lyrics, " "), function(x)
mean(x %in% PositiveWords$word) * 100)
Here we split Lyrics
on space, get the ratio of words which are present in PositiveWords$word
.