Search code examples
stringpostgresqlsplitpgadmin

How to split name data in the same column in PostgreSQL?


I am new to PostgreSQL and am using PGADMIN 4 on a Mac. I have one column of imported data that has some usernames, sometimes a last name and mostly a first and last name in the same column.

I care more to be able to query and count the most occurrences of a name in the column. I will be able to determine by results if it is a first or last for my need. Listing the first 50 should do it. Please assist with the specific code including addressing the table and column.

Have played with this, but need more:

select surname, count(*) from atreedata
    group by surname
    order by count(*) desc limit 40;

Works great with only one name! I need the most common names listed by name and count.

Common Column Example:

John Smith
jsmith3
Stacey123
Bob Smith
Jones

Solution

  • So, if I understand it correctly, you just need to find the most numerous words in surname column.

    There's a built-in function regexp_split_to_table that can split strings to words and creates rows from those words. So:

    select surname_word, count(*) as surname_word_count
    from (
        select regexp_split_to_table(surname, E'\\s+') as surname_word
        from atreedata
    ) as surname_words
    group by surname_word
    order by surname_word_count desc
    limit 40;