Search code examples
sqlregexpostgresql

How to Parse a field with Pattern “userid-date-amount” in postgresql


I’m working on a query where I need to parse Description field that follow the pattern “userid-date-amount-” For example, a valid value would be: 222826-2022/07/26-32,700,000-

However, my current implementation also extracts values like the following:

Tracking code: 1160685022- Date: 2022/07/26- Time: 21:02- Last 4 digits: None- First 4 digits: None- Amount: 350 thousand Dollars- To Paypal

I want to modify my parsing logic to only consider values that strictly adhere to the “userid-date-amount-” format.

SELECT
  i."Id",
  i."Description",
  unnest(regexp_matches(i."Description", '(\d+)-(.*?)-(.*)-')) AS extracted_data
FROM "public"."Invoices" i

I also need to break down the original value into three separate fields like this: 222826, 2022/07/26, and 32,700,000.


Solution

  • You could make the pattern more specific and match the date like pattern with digits and forward slashes.

    For the amount you could start with a single digits and match optional digits or comma's (Or make even that part more specific according to the allowed format)

    (\d+)-(\d{4}/\d\d/\d\d)-(\d[\d,]*)-
    

    See a regex demo

    If the leading digits should not be a partial word match, you could start with a word boundary \y(\d+)