Search code examples
sqlpostgresqlloopspattern-matchingstring-function

In SQL, how to check if a string is the substring of any other string in the same table?


I have table full of strings (TEXT) and I like to get all the strings that are substrings of any other string in the same table. For example if I had these three strings in my table:

WORD        WORD_ID
cup         0
cake        1
cupcake     2

As result of my query I would like to get something like this:

WORD        WORD_ID        SUBSTRING        SUBSTRING_ID
cupcake     2              cup              0
cupcake     2              cake             1 

I know that I could do this with two loops (using Python or JS) by looping over every word in my table and match it against every word in the same table, but I'm not sure how this can be done using SQL (PostgreSQL for that matter).


Solution

  • Use self-join:

    select w1.word, w1.word_id, w2.word, w2.word_id
    from words w1
    join words w2
    on w1.word <> w2.word
    and w1.word like format('%%%s%%', w2.word);
    
      word   | word_id | word | word_id 
    ---------+---------+------+---------
     cupcake |       2 | cup  |       0
     cupcake |       2 | cake |       1
    (2 rows)