Search code examples
google-bigquerybigquery-udf

summarize word count for a timeframe


I have the below table which stores the response text and the keyword search associated with it.

create table nlp.search(response string, words string,inquiry_time timestamp);

insert into nlp.search values('how to reset password','reset word password',TIMESTAMP ("2021-09-19 05:30:00+00")); insert into nlp.search values('how to reset password','reset passphrase',TIMESTAMP ("2021-09-20 07:30:00+00")); insert into nlp.search values('how to reset password','password',TIMESTAMP ("2021-09-16 08:30:00+00")); insert into nlp.search values('how to reset password','reset',TIMESTAMP ("2021-09-14 08:30:00+00"));

I want to provide a summary report in this format response and the count of each individual words associated with it.

response individual_word_count
how to reset password reset(3) word(1) password(2) passphrase(1)

also the timestamp column inquiry_time can be passed to narrow down the date range and the summary values must be computed accordingly

e.g for timeframe filter 2021-09-19 till 2021-09-20

response individual_word_count
how to reset password reset(2) word(1) password(1) passphrase(1)

can this be accomplished using a view?


Solution

  • Use below

    select response, word, count(1) ndividual_word_count
    from `nlp.search`, 
    unnest(split(words, ' ')) word
    where date(inquiry_time) between '2021-09-19' and '2021-09-20'
    group by response, word       
    

    if applied to sample data in your question - output is

    enter image description here

    I Need to display the word and count in 1 single column

    use below then

    select response, 
      string_agg(format('%s (%i)', word, individual_word_count)) counts
    from (
      select response, word, count(1) individual_word_count
      from `nlp.search`, 
      unnest(split(words, ' ')) word
      where date(inquiry_time) between '2021-09-19' and '2021-09-20'
      group by response, word
    )
    group by response
    

    with output

    enter image description here