I have a problem very similar to that described here:
subset of data.frame columns to maximize "complete" observations
I am trying to schedule a workshop that will meet five times. I have ten days from which to choose meeting dates, each day having three overlapping possible meeting times. Hence, I have 30 columns grouped into ten groups (days) of three columns (meeting times) each. I need to select 5 columns (or meeting date–time combinations) subject to the following criteria: only one meeting time is selected per day (one column per group); the number of respondents (rows) who can attend all 5 meetings is maximized. Ideally, I would also want to know how the optimal column selection changes if I relax the criterion that respondents must attend ALL 5 meetings, requiring only that they attend 4, or 3, etc.
For simple visualization, let's say I want to know which two columns I should choose—no more than one each from V1, V2, and V3—such that I maximize the number of rows that have no zeros (i.e. row sums to 2).
V1A V1B V1C V2A V2B V2C V3A V3B V3C
1 0 1 0 1 1 1 0 1
1 1 0 0 1 1 0 1 1
0 0 1 1 1 0 0 1 1
1 1 1 1 0 0 1 0 0
1 0 0 0 1 1 0 1 0
0 1 1 0 1 1 0 0 0
1 0 1 1 1 0 1 0 1
The actual data are here: https://drive.google.com/file/d/0B03dE9-8088aMklOUVhuV3gtRHc/view Groups are mon1* tue1* [...] mon2* tue2* [...] fri2*.
The code proposed in the link above would solve my problem if it were not the case that I needed to select columns from groups. Ideally, I would also be able to say which columns I should choose to maximize the number of rows under the weaker condition that a row could have one zero (i.e. row sums to 5 or 4 or 3, etc.).
Many thanks!
You could use rowSums
to get the index of rows that have greater than or equal to two 1's. (The conditions are not very clear)
lapply(split(names(df),sub('.$', '', names(df))),
function(x) which(rowSums(df[x])>=2))
#$V1
#[1] 1 2 4 6 7
#$V2
#[1] 1 2 3 5 6 7
#$V3
#[1] 1 2 3 7