I'm doing some recreational data analysis and after spending a the day learning how to use VLOOKUP() I've hit a wall I don't think I can figure out through just googling. I have a dataset of a bunch of ideas, linked to the people I want to do them with, a bunch of names associated with them, and the subjects of the ideas. I've been doing fun stuff like how many ideas each subject has, how much of a person's ideas are about a given subject, and how much of a subject's ideas are with a given person. I've also been working out some stats for the whole sheet: total ideas, average ideas per person, average ideas per subject, who has the largest "subject monopoly", whatever seems interesting.
I want to have a stat for who has the most diverse range of subjects, which I would want to indicate by checking who has the lowest highest "subject focus" (how much of a person's ideas are a given subject). I know how I would do this with javascript or python: I want to run through all the "subject focus" values, dividing them up by person, and discarding all but the highest per-person. Then I compare each of these "maximum subject focuses", and pick out the lowest, and display the name associated.
Despite how clear it is in my head, I have no idea how to actually start doing that in Google Sheets. Any advice? Has anyone had to perform a similar operation?
Edit: So the closest I've gotten is to make a new column populated with the formula =if(MAX(FILTER(G:G, A:A=A3)) = 0, "", MAX(FILTER(G:G, A:A=A3)))
, where G:G is where I have the "subject focus" of a given person, and A:A is the name of the person (it starts at A3 because rows 1 and 2 are header). Then, to display the stat, I just have =CONCATENATE("Most diverse idea list: ", VLOOKUP(MIN(J3:J), {J3:J,A3:A}, 2, FALSE), " (", TEXT(ROUND(MIN(J3:J) * 100, 2), "#.00"), "% max focus)")
, where J:J is the new column I added. It works, but obviously this is incredibly messy and if there's a way to do this without making a whole new column for data storage that would be much preferred.
Edit 2: Here's a general mockup of the sheet I'm using, since I'd rather not share it directly.
Use query()
to get the max value in G
grouped by the names in column A
, and wrap it in another query()
to get just the first result. Put this formula in cell N1
:
=query(
query(
A1:H,
"select A, max(G)
where A is not null
and G is not null
group by A",
1
),
"order by Col2 asc
limit 1",
1
)
...and this formula in cell I16
:
="Most diverse idea list: " & N2 & " (" & text(O2, "#.00%") & " max focus)"
To see how this works, try putting this formula in cell K1
:
=query(
A1:H,
"select A, max(G)
where A is not null
and G is not null
group by A",
1
)
If you need to place these results in column J
for each name, put this formula in cell J3
:
=arrayformula(
iferror(
vlookup(
A3:A,
K2:L,
columns(K2:L),
false
)
)
)
This is an array formula that fills whole column in one go, so you will have to clear column J3:J
before inserting the formula to make room for the results.
To put the first two formulas together so that no helper columns are needed, use this:
="Most diverse idea list: " &
join( " (",
query(
query(
A2:H,
"select A, max(G)
where A is not null
and G is not null
group by A
label max(G) ''
format max(G) '0.00%' ",
0
),
"order by Col2 asc
limit 1",
0
)
) & " max focus)"