Search code examples
sqljsonamazon-redshiftnested-json

How to remove all \ from nested json in SQL Redshift?


I've got some problems with extracting values from nested json values in column. I've got a column of data with values that looks almost like nested json, but some of jsons got \ between values and I need to clean them.

JSON looks like this:

{"mopub_json":
   "{\"currency\":\"USD\",
   \"country\":\"US\",
   \"publisher_revenue\":0.01824}
"}

I need to get currency and publisher revenue as different columns and try this:

SET json_serialization_enable TO true;
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, "\t",  '')
FROM table_name)

I receive the next error:

[Amazon](500310) Invalid operation: column "\t" does not exist in events

When I'm trying this:

SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, chr(92),  '')
FROM table_name)

I receive

Invalid operation: JSON parsing error

When I'm trying to extract values without replacing , I'm receiving empty columns.

Thank you for your help!


Solution

  • So your json isn't valid. JSON doesn't allow multiline text strings but I expect that the issue. Based on your query I think you don't want a single key and string but the whole structure. The reason the that quotes are backslashed is because they are inside a string. The json should look like:

    {
      "mopub_json": {
        "currency": "USD",
        "country": "US",
        "publisher_revenue": 0.01824
      }
    }
    

    Then the SQL you have should work.