I have 2 columns of data in a Google Sheet. Column1 is unique words or sentences (words are repeated in sentences) and the Column2 is a numeric value next to each (say votes). I am trying to get a list of unique words from Column1 and then the sum of values (votes) from Column2 when the word was present either on its own or in a sentence.
Here is a sample of the data I am working with in Google Sheets:
Term Votes
apple 20
apple eat 100
orange 30
orange rules 40
rule why 50
This is what the end result looks like:
Word Votes
apple 120
eat 100
orange 70
rules 40
rule 50
why 50
The way I am doing it now is quite long and I am not sure if this is the best solution.
Here's my solution:
JOIN
values in Column1 using a delimiter " " and then SPLIT
them using the same delimiter and then TRANSPOSE
them into a column all in one step. This way I have a list of all the words used in Column1 in say Column3.UNIQUE
values and then do a COUNTIF
for the unique values from Column3. This way I am able to get the frequency of each unique word by referencing to the lsit of all words.TRANSPOSE
Column4 and then QUERY
Column1 and Column2 by using dynamic text in the formula. The formula looks like =QUERY(Column1:Column2, "SELECT SUM(Column2) WHERE Column1 CONTAINS '" & referenceToUniqueWord & "'", 1)
. The reason I have to transpose first is because the query formula outputs 2 cells of data ie Text: sumColumn1 and Number: 'sum of votes'. Since for one cell of unique word I get two cells of data I am not able to drag the formula down and hence I have to do it horizontally.QUERY
formula. And third row is the actual sum of votes, resulting from individual QUERY
formulae. I then transpose these rows to columns and to get my final table I VLOOKUP
the frequency values arrived at earlier.This approach is lengthy and prone to errors. Also doesn't work if the list is large and in the initial JOIN
I get an error of limit 50,000 reached. Any ideas to make it better are welcome. I know this can be done much easier using Scripts but I'd prefer to have it done using only formulae.
try:
=ARRAYFORMULA(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(
IF(IFERROR(SPLIT(A:A, " "))="",,"♠"&SPLIT(A:A, " ")&"♦"&B:B)
,,999^99)),,999^99), "♠")), "♦"),
"select Col1,sum(Col2)
group by Col1
order by sum(Col2) desc
label sum(Col2)''"))