I have a database that has a table called activity
with a column called detail
that has this unfortunate representation of key/value pairs:
Key ID=[813],\n
Key Name=[Name of Key],\n
Some Field=[2732],\n
Another Field=[2751],\n
Description=[A text string here],\n
Location=[sometext],\n
Other ID=[2360578],\n
It's maybe clear from the formatting above, this is a one value per line and \n is a newline character so there's always one extra newline. I'm trying to avoid having an external program process this data, so I'm looking into postgresql's regex functions. The goal is to convert this to a jsonb or hstore column, I don't really care which.
Schema for the table is like:
CREATE TABLE activity
(
id integer NOT NULL,
activity_type integer NOT NULL,
ts timestamp with time zone,
detail text NOT NULL,
details_hstore hstore,
details_jsonb jsonb,
CONSTRAINT activity_pkey PRIMARY KEY (id),
);
So I'd like to run an UPDATE
where I update the details_jsonb
or details_hstore
with the processed data from detail
.
This:
select regexp_matches(activity.detail, '(.*?)=\[(.*?)\]\,[\r|\n]', 'g') as val from activity
gets me these individual rows (this is from pgadmin, I assume these are all strings):
{"Key ID",813}
{"Key Name","Name of Key"}
{"Some Field",2732}
{"Another Field",2751}
{Description,"A text string here"}
{Location,sometext}
{"Other ID",2360578}
I'm not a regex whiz but I think I need some kind of grouping. Also, that's returning as a text array of some kind, but what I really want is like this for jsonb
{"Key ID": "813", "Key Name": "Name of Key"}
or even better, if it's a number only then
{"Key ID": 813, "Key Name": "Name of Key"}
and/or the equivalent for hstore.
I feel like I'm a number of regex-in-postgres concepts away from this goal.
Is this kind of regex update too much to get working in an update? i.e. update activity set details_jsonb = [[insane regex here]]
? hstore is also an option (though I like that jsonb has types), so if it's easier to go to an hstore function like hstore(text[]) that's fine too.
Am I crazy and do I need to just write an external process not-in-postgresql that does this?
I would first split the single value into multiple lines. Each line can then be converted to an array from which this can be aggregated into a JSON object:
select string_to_array(regexp_replace(t.line, '(^\s+)|(\s+$)', '', 'g'), '=')
from activity a, regexp_split_to_table(a.detail, ',\s*\n') t (line)
This returns the following:
element
------------------------------------
{KeyID,[813]}
{"Key Name","[Name of Key]"}
{"Some Field",[2732]}
{"Another Field",[2751]}
{Description,"[A text string here]"}
{Location,[sometext]}
{"Other ID",[2360578]}
{}
The regex to split the detail
value into lines might need some improvements though.
The regexp_replace(t.line, '(^\s+)|(\s+$)', '', 'g')
is there trim the values before converting them to an array.
Now this can be aggregated into a single JSON value, or each line can be converted into a single hstore value (unfortunately there is no hstore_agg()
)
with activity (detail) as (
values (
'Key ID=[813],
Key Name=[Name of Key],
Some Field=[2732],
Another Field=[2751],
Description=[A text string here],
Location=[sometext],
Other ID=[2360578],
')
), elements (element) as (
select string_to_array(regexp_replace(t.line, '\s', ''), '=')
from activity a, regexp_split_to_table(a.detail, ',') t (line)
)
select json_agg(jsonb_object(element))
from elements
where cardinality(element) > 1 -- this removes the empty line
The above returns a JSON object:
[ { "KeyID" : "[813]" },
{ "Key Name" : "[Name of Key]" },
{ "Some Field" : "[2732]" },
{ "Another Field" : "[2751]" },
{ "Description" : "[A text string here]" },
{ "Location" : "[sometext]" },
{ "Other ID" : "[2360578]" }
]