Search code examples
google-sheetslambdagoogle-sheets-formulaarray-formulascounting

Finding Unique 2 Word Pairs In Google Sheets And Then Counting The Frequency From A Range ("Phrase Word Cloud")


Please see: Find every Unique 2 word pairings from a column of terms in Google Sheets

In the above question the asker was was looking for unique 2 word pairings and gave the example of having 3 words in a cell.

The formula given works if there is only 1 to 3 words in a cell, but as soon as you go over (4+ words) the formula returns a #value Error "in ARRAY_LITERAL, an Array Literal was missing values for one or more rows"

=arrayformula(query(flatten({query({if(A:A<>"",iferror(trim(split(regexreplace(regexreplace(regexreplace(lower(A:A),"[^A-Za-z\ \']+",""),"([\w\']+\ [\w\']+)","$1,"),"(.*\,)(.*)","$1"),",")),),)},"where Col1 is not null",0);query({if(A:A<>"",iferror(trim(split(regexreplace(regexreplace(regexreplace(lower(if(len(A:A)=len(regexreplace(A:A," ",""))+1,,A:A)),"[^A-Za-z\ \']+",""),"(\w*)(\ )([\w\']+\ [\w\']+)","$3,"),"(.*\,)(.*)","$1"),",")),),)},"where Col1 is not null",0)}),"select Col1,count(Col1) where Col1 is not null group by Col1 label Col1 'Word pairs',Count(Col1) 'Qty' ",0))

Is it possible to make the formula work with cells that include 25, 50, 100+ words and then be able to exclude common words from a column like done in Extracting and counting unique word frequency from a range

   =ArrayFormula(query(transpose(split(query(regexreplace(A1:A5, "[^A-Za-z\s/']" ,""),,50000)," ")), "Select Col1, Count(Col1) where not UPPER(Col1) matches '\b"&textjoin("|", 1, UPPER(J1:J20))&"\b'  group by Col1 order by Count(Col1) desc label Count(Col1)''"))

I'm trying to find unique word pairings in sentences and then count how many times they appear.

I've edited my question to try to make it more readable (sorry still don't completely now how to use stack-overflow formatting or respond to comments etc.)

TEXT column and EXCLUSIONS is the input

UNIQUE PAIRS and COUNT would be the Output

TEXT UNIQUE PAIRS COUNT EXCLUSIONS
I wake up with night sweats and struggle with staying asleep. night sweats 2 I wake
I wake up from night sweats and struggle with dry hair. staying asleep 1 wake up
dry hair 1 up with
with night
sweats and
and struggle
with staying
struggle with
up from
from night
with dry

Purpose is to be able to find common word pairings like

bed head, dry hair, night sweats, etc.

That you can't find just by looking at the frequency of single words


Solution

  • try:

    =INDEX(LAMBDA(a, LAMBDA(x, QUERY(QUERY(x&" "&{QUERY(x, 
     "offset 1", ); ""}, "where not Col1 contains '×××'", ), 
     "select Col1,count(Col1) group by Col1 order by count(Col1) desc label count(Col1)''"))
     (FLATTEN(SPLIT(QUERY(FLATTEN({REGEXREPLACE(LOWER(a), "[\.,\?!\(\)]", ), 
     IFERROR(a/0, "×××")}),,9^9), " "))))(A1:A2))
    

    enter image description here


    update:

    =INDEX(LAMBDA(a, LAMBDA(x, QUERY(QUERY(x&" "&{QUERY(x, 
     "offset 1", ); ""}, "where not Col1 contains '×××'", ), 
     "select Col1,count(Col1) 
      where not Col1 matches '"&TEXTJOIN("|", 1, UNIQUE(LOWER(E2:E)))&"' 
      group by Col1 order by count(Col1) desc label count(Col1)''"))
     (FLATTEN(SPLIT(QUERY(FLATTEN({REGEXREPLACE(LOWER(a), "[\.,\?!\(\)]", ), 
     IFERROR(a/0, "×××")}),,9^9), " "))))(A1:A2))
    

    enter image description here