Search code examples
sqldb2ibm-midrange

Splitting a string based on delimiter using SQL


I have a "flat file" with structure as below:

machineCode,Key,Ip_Name_No,Share_Percent,Account_Name,Account_No                       
"ygh048GT",4767,534293748,"100.00","cderfgdsc Publishing International Ltd","160102040"  
"xcd064HW",6380,65424090,"100.00","dascdfrgh snm skion","00090382478"                    
"000065AN",6402,65424090,"100.00","xcdertn,john sean","00090382478"                    
.....

The first row are the column headings. As can be seen, the fields are separated by a comma. The requirement is to split the single string into separate fields.

This could be done by excel and then uploaded to a DB table using the data to columns option with comma as delimiter but the Account_Name field can contain commas within the values itself.

So, I came up with the below SQL. Question is, does this look correct ? Also, there must be some easier way to do this, any suggestions ?

WITH POS AS (
select 
LOCATE_IN_STRING ( DATA , ',' , 2 )  - 1 AS TUNECODE_END ,
LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1)  - 1 AS WORKKEY_END,
LOCATE_IN_STRING( DATA , ','  ,   (LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1) + 1) ) - 1 AS IPNN_END,
LOCATE_IN_STRING( DATA , ','  , (LOCATE_IN_STRING( DATA , ','  ,   (LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1) + 1) )  + 1) ) - 1 AS PERC_END,

CASE WHEN
 SUBSTR ( DATA , 
(
LOCATE_IN_STRING ( DATA , ',' , 
(LOCATE_IN_STRING( DATA , ','  , (LOCATE_IN_STRING( DATA , ','  ,   (LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1) + 1) )  + 1) ) + 1 ) ) + 1),
1) <> '"' 
THEN
LOCATE_IN_STRING ( DATA , ',' ,  
(LOCATE_IN_STRING ( DATA , ',' , 
(LOCATE_IN_STRING( DATA , ','  , (LOCATE_IN_STRING( DATA , ','  ,   (LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1) + 1) )  + 1) ) + 1 ) ) + 1)) - 1 

ELSE

LOCATE_IN_STRING ( DATA , ',' , 
(LOCATE_IN_STRING( DATA , ','  , (LOCATE_IN_STRING( DATA , ','  ,   (LOCATE_IN_STRING (DATA, ',' ,  LOCATE_IN_STRING ( DATA , ',' , 2 ) + 1) + 1) )  + 1) ) + 1 ) ) - 1

END AS ACNAME_END,
RRN(P) ROWN

FROM PLDWRK P 

) SELECT 

CAST ( SUBSTR( DATA , 1, TUNECODE_END ) AS CHAR(25))AS MACHINECODE ,
CAST ( SUBSTR( DATA , TUNECODE_END + 2 ,  WORKKEY_END - (TUNECODE_END + 1)  ) AS DEC(12,0))  AS KEY,
CAST(SUBSTR( DATA , WORKKEY_END + 2,  IPNN_END - (WORKKEY_END + 1) ) AS DEC(12, 0 )) AS IP_NN,
CAST (SUBSTR( DATA, IPNN_END + 2, PERC_END - (IPNN_END + 1)) AS CHAR(8))AS PERCENTAGE,
CAST (SUBSTR( DATA, PERC_END + 2, ACNAME_END - (PERC_END + 1)) AS CHAR(100)) AS ACCOUNT_NAME,
CAST (SUBSTR( DATA, ACNAME_END + 2 ) AS CHAR(30)) as ACCOUNT_NUMBER

 FROM PLDWRK P JOIN POS ON ROWN = RRN(P) 

Solution

  • Lots of things are easier than raw SQL...

    • Why not simply use Copy From Import File (CPYFRMIMPF) that's what it's designed for.
      CPYFRMIMPF FROMSTMF('/inbound/somedata.csv') TOFILE(MYLIB/MYTABLE) MBROPT(*REPLACE) RCDDLM(*CRLF) DTAFMT(*DLM) STRDLM(*DBLQUOTE) RMVCOLNAM(*YES)
      You'll have to transfer the stream data into the IFS (where it really belongs) instead of a DB table.

    • IBM's Access Client Solutions (ACS) includes data transfer functionality that can understand .CSV files. This can be automated and can in fact run on either a PC or the IBM i itself.

    • Another great option would be an RPG program, back in 2008 Scott Klement wrote a CSV parser in RPG. He's since enhanced it to make it easier to use by taking advantage of RPG's DATA-INTO op-code.

    • Lastly it's 2023...node.js, PHP, Python are all available on the IBM i and all of them have libraries/packages to handle CSV and write to a DB table.