I am working on a system that writes text logs to a text column in PostgreSQL 12. I want to perform a simple search on a keyword (this part's easy!), but then I want to extract only a section of the text content that appears proximate to the located search keyword (to provide context).
The logs contain lines separated by carriage returns. Say I want to search for the keyword "exception" and bring back each line in the text column that contains the keyword. Or, maybe I want to return the line with the keyword as well as the line before and the line afterwards.
Is this possible using PostgreSQL query features alone? Or will I need to write other custom code to extract just the lines I'm interested in. (BTW, I don't currently use the tsvector feature.)
Any help would be greatly appreciated. Thanks!
[Edited to provide additional details.]
Say I have a table called LOG_RECORDS that contains a TEXT column called LOG_CONTENT. One particular row of this table contain the following in LOG_CONTENT:
2021-05-28 13:34:20.6101-04 - Loading Configurations
2021-05-28 13:34:31.6656-04 - Settings Initialization Complete: Test Device Configuration
2021-05-28 13:34:31.6656-04 - RecordWriter:Initialize
2021-05-28 13:34:32.3982-04 - Migrating configuration database data from schema version 21 to 21
2021-05-28 13:34:32.4590-04 - Add User successful: Export
2021-05-28 13:34:44.6605-04 - SerialConnectionManagedReadBufferBase - ReadTask - System.InvalidOperationException: The operation is not allowed on non-connected sockets.
at InTheHand.Net.Bluetooth.Msft.SocketBluetoothClient.GetStream()
at Sensus.CommunicationDevices.SerialConnections.SerialConnectionBT.ReadInternal(Byte[] buffer, Int32 maxBytes)
at Sensus.CommunicationDevices.SerialConnections.SerialConnectionManagedReadBufferBase.ReadTask()
2021-05-28 13:34:47.1807-04 - Communicator connection failed
2021-05-28 13:37:36.5337-04 - Linking
2021-05-28 13:37:37.2534-04 - Operation complete
To search for the keyword "exception" I can simply query the following:
SELECT * FROM LOG_RECORDS WHERE UPPER(LOG_CONTENT) LIKE '%EXCEPTION%';
But I would like to get back just the following (which represents the line containing the keyword "exception" plus the lines before and after), rather than the entire TEXT value:
2021-05-28 13:34:32.4590-04 - Add User successful: Export
2021-05-28 13:34:44.6605-04 - SerialConnectionManagedReadBufferBase - ReadTask - System.InvalidOperationException: The operation is not allowed on non-connected sockets.
at InTheHand.Net.Bluetooth.Msft.SocketBluetoothClient.GetStream()
If that's too challenging, just getting back the line with the keyword would be great:
2021-05-28 13:34:44.6605-04 - SerialConnectionManagedReadBufferBase - ReadTask - System.InvalidOperationException: The operation is not allowed on non-connected sockets.
Hopefully this makes sense. Thank you!
Here is a solution but not very performant:
select t.id, t2.arr || E'\n' ||t1.arr from log_records t
cross join lateral unnest(string_to_array(log_content,E'\n')) with ordinality t1(arr,e)
join lateral unnest(string_to_array(log_content,E'\n')) with ordinality t2(arr,e) on t1.e=t2.e+1
where t1.arr ~ 'Exception'
and t.log_content ~ 'Exception'
Also AFAIK use of Full text search
is not possible in this case because your log does not contain Exception
as a separate word or prefix of a word. It is present as a suffix or middle of a word and ts_query
support only full text search or prefix search.
Also add GIN
index on you column to make the search faster