Search code examples
javascriptregexregex-lookaroundsregex-group

Parsing text using regex javascript


guys i am stuck while parsing following text into object. I have created two separate regex but i want to make only one. Below i am posting sample text as well as my following regex pattern.

PAYER:\r\n\r\n   MCNA \r\n\r\nPROVIDER:\r\n\r\n   MY KHAN \r\n   Provider ID: 115446397114\r\n   Tax ID: 27222193992\r\n\r\nINSURED:\r\n\r\n   VICTORY OKOYO\r\n   Member ID: 60451158048\r\n   Birth Date: 05/04/2008\r\n   Gender: Male\r\n\r\nCOVERAGE TYPE:\r\n\r\n   Dental Care

REGEX:

 re = new RegExp('(.*?):\r\n\r\n(.*?)(?:\r\n|$)', 'g');
re2 = new RegExp('(.*?):(.*?)(?:\r\n|$)', 'g');

Expected result:

{
  payer: 'MCNA',
  provider: 'MY KHAN'
}

Solution

  • This turns your input into an object that contains all key/value pairs:

    const input = 'PAYER:\r\n\r\n   MCNA \r\n\r\nPROVIDER:\r\n\r\n   MY KHAN \r\n   Provider ID: 115446397114\r\n   Tax ID: 27222193992\r\n\r\nINSURED:\r\n\r\n   VICTORY OKO\r\n   Member ID: 60451158048\r\n   Birth Date: 05/04/2009\r\n   Gender: Male\r\n\r\nCOVERAGE TYPE:\r\n\r\n   Dental Care';
    
    let result = Object.fromEntries(input
      .replace(/([^:]+):\s+([^\n\r]+)\s*/g, (m, c1, c2) => c1.toLowerCase() + '\r' + c2 + '\n')
      .split('\n')
      .filter(Boolean)
      .map(item => item.trim().split('\r'))
    );
    console.log(result);

    Output:

    {
      "payer": "MCNA",
      "provider": "MY KHAN",
      "provider id": "115446397114",
      "tax id": "27222193992",
      "insured": "VICTORY OKO",
      "member id": "60451158048",
      "birth date": "05/04/2009",
      "gender": "Male",
      "coverage type": "Dental Care"
    }
    

    Explanation:

    • Object.fromEntries() -- convert a 2D array to object, ex: [ ['a', 1], ['b', 2] ] => {a: 1, b: 2}
    • .replace() regex /([^:]+):\s+([^\n\r]+)\s*/g -- two capture groups, one for key, one for value
    • replace action c1.toLowerCase() + '\r' + c2 + '\n' -- convert key to lowercase, separate key/value pairs with newline
    • .split('\n') -- split by newline
    • .filter(Boolean): -- remove empty items
    • .map(item => item.trim().split('\r')) -- change array item to [key, value], e.g. change flat array to 2D array

    You could add one more filter after the .map() to keep only keys of interest.