Search code examples
rduplicatesmatchstring-matching

Compare values in rows and identify duplicates


I have a feeling this is a pretty simple one. I have a data frame that looks something like this:

ID   Genre1         Genre2
1    Comedy         Comedy
2    Drama          Drama
3    Sport          Sport
4    Drama          Comedy
5    Documentary    Documentary
6    Entertainment  Entertainment
7    Film           Film
8    Drama          Crime Drama
9    Crime Drama    Drama

I want to identify which rows have the same values (e.g. "comedy" and "comedy") and create a new column called match which labels them as "yes" (or "no", for those that don't match).

Based on the sample above, the expected output should look something like this:

ID   Genre1         Genre2          Match
1    Comedy         Comedy          Yes
2    Drama          Drama           Yes
3    Sport          Sport           Yes
4    Drama          Comedy          No
5    Documentary    Documentary     Yes
6    Entertainment  Entertainment   Yes
7    Film           Film            Yes
8    Drama          Crime Drama     No
9    Crime Drama    Drama           No

Any ideas how I could go about doing this and/or what package would be best? Thanks in advance!


Solution

  • Use ifelse:

    df$Match <- ifelse(df$Genre1 == df$Genre2, 'Yes', 'No')