I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string.
so something like this:
strs <- c('this', 'that', 'chat', 'chin')
thi <- c(1, 0, 0, 0)
tha <- c(0, 1, 0, 0)
hin <- c(0, 0, 0, 1)
hat <- c(0, 1, 1, 0)
df <- data.frame(strs, thi, tha, hin, hat)
df
# strs thi tha hin hat
#1 this 1 0 0 0
#2 that 0 1 0 1
#3 chat 0 0 0 1
#4 chin 0 0 1 0
I want to get all of the columns/trigrams that have a 1 for a given row or a given string.
So for row 2, the string 'that', the result would a data frame that looks like this:
str tha hat
1 this 0 0
2 that 1 1
3 chat 0 1
4 chin 0 0
How could I do this?
This will give you the desired output df.
givenStr <- "that"
row <- df[df$strs==givenStr,]
df[,c(1,1+which(row[,-1]==1))]