Search code examples
pythonduckdb

Keep rows where a field of a list[struct] column contains a message


Say I have the following data:

import duckdb
rel = duckdb.sql("""
    FROM VALUES
        ([{'a': 'foo', 'b': 'bta'}]),
        ([]),
        ([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
        df(my_col)
    SELECT *
""")

which looks like this:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ []                                           │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

I would like to keep all rows where for any of the items in one of the elements of 'my_col', field 'a' contains the substring 'bt'

So, expected output:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

How can I write a SQL query to do that?


Solution

  • Maybe list_sum() the bools or list_bool_or()?

    duckdb.sql("""
    FROM VALUES
        ([{'a': 'foo', 'b': 'bta'}]),
        ([]),
        ([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
        df(my_col)
    SELECT *
    WHERE list_bool_or(['bt' in s.b for s in my_col])
    """)
    
    ┌──────────────────────────────────────────────┐
    │                    my_col                    │
    │        struct(a varchar, b varchar)[]        │
    ├──────────────────────────────────────────────┤
    │ [{'a': foo, 'b': bta}]                       │
    │ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
    └──────────────────────────────────────────────┘
    

    The list comprehension is the same as list_apply(my_col, s -> 'bt' in s.b)