I have a df (reference image) that I create that shows an aggregation of all the combinations each publisher has to another and then does calculations based on said pair(s).
I want to pull every distinct pair that only contains 2 publishers and all the other field values that are tied to that pair (example would be Amazon, CBS but twice since there is one for month 10 and one for month 11 and so on.
How do I extract this or apply some dplyr function to only pull those? Was thinking of using a regex function in with a pipe but not sure how to do it.
Publisher | month_grp
Amazon, CBS 10
Amazon, CBS 11
Amazon, CW 10
Amazon, CW 11
Amazon, ESPN 10
Amazon, ESPN 11
I think that you want the rows with just one comma in the publishers column. You can get these using
df[grep('^[^,]+,[^,]+$', df$publishers),]
UPDATE: Based on a refinement of the question in the comments.
If you want to get rows that have either one or two publishers, you can use:
df[grep('^[^,]+(,[^,]+)?$', df$publishers),]
Enclosing the ,[^,]+
part in ( )?
makes it optional, so this will get rows with either one or two publishers.