id | value |
---|---|
123 | {78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}} |
333 | {5mdzrZ={"Sent": 1, "Respond": 1, "NoResponset": 1}} |
Given the table above, I am trying to extract the "Sent" value... In cases where there are multiple sent values then I want to take the max.
I have tried using json_extract, json_extract_scalar, json_parse and multiple other functions in SQL but nothing seems to work. I keep getting NULL values in all my attempts.
The expected outcome given the example above should be:
id | Sent |
---|---|
123 | 77 |
333 | 1 |
I think one way to approach this is by first doing a CROSS JOIN UNNEST to split the value column by 78kfcX, 97Facz, 5mdzrZ ids. And then extracting the sent value from there and taking the max, grouped by the id column.
Attempted code:
SELECT
id,
json_extract_scalar(value, '$.Sent') AS 'sent'
FROM table
JSON_PARSE(value) returns the following error:
Cannot convert value to JSON: '{78kfcX={"Sent": 77, "Respond": 31, "NoResponse": 31}, 97Facz={"Sent": 45, "Respond": 31, "NoResponse": 31}}'
answer for original question
The input string itself is not in JSON format. Therefore, we need to first extract {"Sent": 77, "Respond": 31, "NoResponse": 31}
out by regular expression, and then apply JSON_EXTRACT_SCALAR() function on it.
The following regex pattern can extract the attribute maps out from the input (Regex101 verification):
\{".*?\}
With alll above tricks, in Presto, you can get the result you want in the following steps:
Step 1. Use regexp_extract_all() function to extract all attribute maps out of the input string into an array.
Step 2. Apply JSON_extract_scalar() to each element of the array and extract Sent
part out
Step 3. Use array_max() function to get the max value you want.
Here is the query in Presto:
SELECT
id,
APPAY_MAX(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
)
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 77 |
333 | 1 |
answer for the follow-up If we want to calculation sum instead of max from the array, we can use reduce() function and lamda expression to do that.
Here is the query:
SELECT
id,
REDUCE(
TRANSFORM(
REGEXP_EXTRACT_ALL(value, '\{".*?\}'),
v -> CAST(JSON_EXTRACT_SCALAR(v, '$.Sent') AS INT)
), 0, (s,x) -> s + x, s -> s
) AS sent
FROM
input_table
id | Sent |
---|---|
123 | 122 |
333 | 1 |