Search code examples
mysqlsqltextstring-lengthword-list

MYSQL How to count the length of each word in text column


Let's assume I have a simple table in mysql with a TEXT column named words which include arbitrary text, for example 'The big brown dog jumped over the lazy fox'.

How can I find the length of each word in the text using mysql query ?

The results should be: 3 3 5 3 6 4 3 4 3 - or something similar. Can be sorted, or can include the words themselves. Couldn't find my self to do that. I could find some samples of counting the number of words. But I need the LENGTH of each word.


Solution

  • Your example doesn't show punctuation, so I'm going to assume you just want the value split up by spaces (so The fox. will give 3 for The and 4 for fox.)

    Given:

    CREATE TABLE `foo` (
        `bar` text COLLATE utf8_unicode_ci DEFAULT NULL
    );
    insert into foo values ('Lorem ipsum dolor sit amet'),('consectetur adipiscing elit'),('sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'),('Ut enim ad minim veniam'),('quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat'),('Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur'),('Excepteur sint occaecat cupidatat non proident'),('sunt in culpa qui officia deserunt mollit anim id est laborum');
    

    you can use a recursive CTE to get all the words in each text column:

    with recursive words (word,remaining,word_number,full_string) as (
        select
            substring_index(bar,' ',1) word,
            if(instr(bar,' '),right(bar,length(bar)-instr(bar,' ')),'') remaining,
            1 word_number,
            bar full_string
        from foo
    
        union all
    
        select
            substring_index(remaining,' ',1) word,
            if(instr(remaining,' '),right(remaining,length(remaining)-instr(remaining,' ')),'') remaining,
            word_number+1 word_number,
            full_string
        from words
        where length(remaining)
    )
    select length(word) from words;
    

    For earlier database versions without support for recursive common table expressions, you have to assume some maximum number of words and create a subquery that has rows with numbers from 1 up to that maximum. For example, given a maximum of 256, you can say:

    select i*64+j*16+k*4+l+1 n
    from (select 0 l union all select 1 union all select 2 union all select 3) l
    cross join (select 0 k union all select 1 union all select 2 union all select 3) k
    cross join (select 0 j union all select 1 union all select 2 union all select 3) j
    cross join (select 0 i union all select 1 union all select 2 union all select 3) i;
    

    Or if you prefer, you can just use a long list of (select 1 n union all select 2 union all select 3 ... union all select 256).

    Then join that to your table and extract each word:

    select
        length(substring_index(substring_index(bar,' ',word_number),' ',-1)) word_length,
        substring_index(substring_index(bar,' ',word_number),' ',-1) word,
        word_number,
        bar full_string
    from (
        select i*64+j*16+k*4+l+1 word_number
        from (select 0 l union all select 1 union all select 2 union all select 3) l
        cross join (select 0 k union all select 1 union all select 2 union all select 3) k
        cross join (select 0 j union all select 1 union all select 2 union all select 3) j
        cross join (select 0 i union all select 1 union all select 2 union all select 3) i
    ) n
    join foo on word_number <= length(bar)-length(replace(bar,' ',''))+1;