Search code examples
sqljsonamazon-athenaprestotrino

SQL to find multiple occurence of a substring in a column of type varchar and its value


I am using Athena as database tables. I want to parse a column called 'line_items' in table orders. So each row in orders table consist of a customer order and line_items contain details of all the products included in the customer order. I want to find the product_id of each item in line_items and then based on these product_id, I need to join each product_id with its details in product_details table.

The data type of line_items is varchar, and it cannot be changed.

The sample value of one of the order is as follows:

   [{'id': 13942775087176, 'admin_graphql_api_id': 'gid://shop/LineItem/13942775087176', 'fulfillable_quantity': 0, 'fulfillment_service': 'manual', 'fulfillment_status': 'fulfilled', 'gift_card': False, 'grams': 585, 'name': 'Military Green Comfort Chino Pants - 36', 'pre_tax_price': 44.89, 'pre_tax_price_set': {'shop_money': {'amount': '44.89', 'currency_code': 'USD'}, 'presentment_money': {'amount': '44.89', 'currency_code': 'USD'}}, 'price': 44.89, 'price_set': {'shop_money': {'amount': 44.89, 'currency_code': 'USD'}, 'presentment_money': {'amount': 44.89, 'currency_code': 'USD'}}, 'product_exists': True, 'product_id': 6633367306312, 'properties': [{'name': '_igTestGroups', 'value': 'fee1aa1f534b,71aca8e2923f'}, {'name': '_igTestGroup', 'value': '33c7ca5a-faf4-452e-adba-fee1aa1f534b'}], 'quantity': 1, 'requires_shipping': True, 'sku': 'TCT4702MGRN36', 'tax_code': 'PC040100', 'taxable': True, 'title': 'Military Green Comfort Chino Pants', 'total_discount': 0.0, 'total_discount_set': {'shop_money': {'amount': 0.0, 'currency_code': 'USD'}, 'presentment_money': {'amount': 0.0, 'currency_code': 'USD'}}, 'variant_id': 39507850657864, 'variant_inventory_management': 'shop', 'variant_title': '36', 'vendor': 'True Classic', 'tax_lines': [{'channel_liable': False, 'price': 2.51, 'price_set': {'shop_money': {'amount': 2.51, 'currency_code': 'USD'}, 'presentment_money': {'amount': 2.51, 'currency_code': 'USD'}}, 'rate': 0.056, 'title': 'AZ STATE TAX'}, {'channel_liable': False, 'price': 0.31, 'price_set': {'shop_money': {'amount': 0.31, 'currency_code': 'USD'}, 'presentment_money': {'amount': 0.31, 'currency_code': 'USD'}}, 'rate': 0.007, 'title': 'AZ COUNTY TAX'}, {'channel_liable': False, 'price': 1.03, 'price_set': {'shop_money': {'amount': 1.03, 'currency_code': 'USD'}, 'presentment_money': {'amount': 1.03, 'currency_code': 'USD'}}, 'rate': 0.023, 'title': 'AZ CITY TAX'}], 'duties': [], 'discount_allocations': []}, {'id': 13942775119944, 'admin_graphql_api_id': 'gid://shop/LineItem/13942775119944', 'fulfillable_quantity': 0, 'fulfillment_service': 'manual', 'fulfillment_status': 'fulfilled', 'gift_card': False, 'grams': 585, 'name': 'Khaki Comfort Chino Pants - 36', 'pre_tax_price': 44.89, 'pre_tax_price_set': {'shop_money': {'amount': '44.89', 'currency_code': 'USD'}, 'presentment_money': {'amount': '44.89', 'currency_code': 'USD'}}, 'price': 44.89, 'price_set': {'shop_money': {'amount': 44.89, 'currency_code': 'USD'}, 'presentment_money': {'amount': 44.89, 'currency_code': 'USD'}}, 'product_exists': True, 'product_id': 6633366388808, 'properties': [{'name': '_igTestGroups', 'value': 'fee1aa1f534b,71aca8e2923f'}, {'name': '_igTestGroup', 'value': '33c7ca5a-faf4-452e-adba-fee1aa1f534b'}], 'quantity': 1, 'requires_shipping': True, 'sku': 'TCT4702KHAKI36', 'tax_code': 'PC040100', 'taxable': True, 'title': 'Khaki Comfort Chino Pants', 'total_discount': 0.0, 'total_discount_set': {'shop_money': {'amount': 0.0, 'currency_code': 'USD'}, 'presentment_money': {'amount': 0.0, 'currency_code': 'USD'}}, 'variant_id': 39507846725704, 'variant_inventory_management': 'shop', 'variant_title': '36', 'vendor': 'True Classic', 'tax_lines': [{'channel_liable': False, 'price': 2.51, 'price_set': {'shop_money': {'amount': 2.51, 'currency_code': 'USD'}, 'presentment_money': {'amount': 2.51, 'currency_code': 'USD'}}, 'rate': 0.056, 'title': 'AZ STATE TAX'}, {'channel_liable': False, 'price': 0.31, 'price_set': {'shop_money': {'amount': 0.31, 'currency_code': 'USD'}, 'presentment_money': {'amount': 0.31, 'currency_code': 'USD'}}, 'rate': 0.007, 'title': 'AZ COUNTY TAX'}, {'channel_liable': False, 'price': 1.03, 'price_set': {'shop_money': {'amount': 1.03, 'currency_code': 'USD'}, 'presentment_money': {'amount': 1.03, 'currency_code': 'USD'}}, 'rate': 0.023, 'title': 'AZ CITY TAX'}], 'duties': [], 'discount_allocations': []}]

I wrote a query which returns me id of first line_item, but I need all the ids in the line_items. How can I find the SQL solution for this?

Query written by me:

select split_part(SUBSTRING(line_items, posa + 6, 20), ',', 1) as line_item_id from (
 select POSITION('''id'': ' IN line_items) as posa, substring(line_items, POSITION('''id'': ' IN line_items)+6, POSITION(',' IN line_items)) AS id_value, line_items
 from orders where id in ('45245','5463556','64874')
 )

Output of my query is:

13942775087176

I want id of each line-item and then later on, also need to do summation of amount field in the line_items.


Solution

  • Your data is JSON so you can process as JSON by parsing, casting to array of JSON and then using unnest:

    -- sample data
    with dataset(line_items) as (
        values (your_json_str_from_post)
    )
    
    -- query
    select json_extract_scalar(m, '$.id') product_id,
           json_extract_scalar(m, '$.quantity') quantity
    from dataset,
        unnest(cast(json_parse(line_items) as array(json))) as t(m);
    

    Output:

    product_id quantity
    13942775087176 1
    13942775119944 1