sql sql-server insert common-table-expression

SQL Server using CTE for Insert

I have an asterisk delimited flat file, I import it into a single column in a table to check for errors. Once happy, I then want to split it up and insert it into a table with a 17 columns that correspond to each field in the delimited single column.

The column looks like this

5*0121**01300421142116*CW*WAREHOUSE AND PREMISES*8497395000*01-APR-2016****18017262000*25-APR-2016*096G*26621354212*25-APR-2016*17-MAY-2016*

I use this to split it and insert it:

WITH cte AS 
(
    SELECT 
        check_content,
        SUBSTRING(check_content, 1, ISNULL(NULLIF(CHARINDEX('*', check_content), 0) - 1, LEN(check_content))) sString,
        NULLIF(CHARINDEX('*', check_content), 0) cIndex,
        1 Lvl
    FROM   
        delimiter_check_reval T
    UNION ALL
    SELECT 
        check_content,
        SUBSTRING(check_content, cindex + 1, ISNULL(NULLIF(CHARINDEX('*', check_content, cindex + 1), 0) - 1 - cindex, LEN(check_content))),
        NULLIF(CHARINDEX('*', check_content, cindex + 1), 0),
        lvl + 1
    FROM   
        cte
    WHERE  
        cindex IS NOT NULL
)
INSERT INTO raw_epoch_historic
            (revaluation_id,
             billing_authority_code,
             ndr_community_code,
             ba_reference_number,
             primary_and_secondary_description_code,
             primary_description_text,
             unique_address_reference_number,
             effective_date,
             composite_indicator,
             rateable_value,
             appeal_settlement_code,
             assessment_reference,
             list_alteration_date,
             scat_code_and_suffix,
             case_number,
             historic_from_date,
             historic_to_date)
SELECT Max(CASE
             WHEN lvl = 1 THEN sstring
           END) val1,
       Max(CASE
             WHEN lvl = 2 THEN sstring
           END) val2,
       Max(CASE
             WHEN lvl = 3 THEN sstring
           END) val3,
       Max(CASE
             WHEN lvl = 4 THEN sstring
           END) val4,
       Max(CASE
             WHEN lvl = 5 THEN sstring
           END) val5,
       Max(CASE
             WHEN lvl = 6 THEN sstring
           END) val6,
       Max(CASE
             WHEN lvl = 7 THEN sstring
           END) val7,
       Max(CASE
             WHEN lvl = 8 THEN sstring
           END) val8,
       Max(CASE
             WHEN lvl = 9 THEN sstring
           END) val9,
       Max(CASE
             WHEN lvl = 10 THEN sstring
           END) val10,
       Max(CASE
             WHEN lvl = 11 THEN sstring
           END) val11,
       Max(CASE
             WHEN lvl = 12 THEN sstring
           END) val12,
       Max(CASE
             WHEN lvl = 13 THEN sstring
           END) val13,
       Max(CASE
             WHEN lvl = 14 THEN sstring
           END) val14,
       Max(CASE
             WHEN lvl = 15 THEN sstring
           END) val15,
       Max(CASE
             WHEN lvl = 16 THEN sstring
           END) val16,
       Max(CASE
             WHEN lvl = 17 THEN sstring
           END) va1l7
--, datastring OriginalDataString
FROM   cte
GROUP  BY check_content; `

it works perfectly on a reasonable number of rows but the whole file is 2.5m rows and i then get this error:

Warning: Null value is eliminated by an aggregate or other SET operation. Could not allocate space for object 'dbo.WORKFILE GROUP large record overflow storage: 140739078979584' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

And it fails to insert.

We got no inserted rows.

Solution

Please try the following solution based on tokenization.

It has no limitation on numbers of rows to process.

You would need to complete the SELECT clause, and include the rest of the columns with the appropriate data types.

dbfiddle

SQL

DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, check_content VARCHAR(2048));
INSERT @tbl (check_content) VALUES
('5*0121**01300421142116*CW*WAREHOUSE AND PREMISES*8497395000*01-APR-2016****18017262000*25-APR-2016*096G*26621354212*25-APR-2016*17-MAY-2016*');

-- INSERT INTO raw_epoch_historic ...
SELECT t.ID
  , c.value('(/root/r[1]/text())[1]', 'INT') as revaluation_id
  , c.value('(/root/r[2]/text())[1]', 'VARCHAR(10)') as billing_authority_code
--  continue here up to 17th column, via r[1 to 17]
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
    REPLACE(check_content, '*', ']]></r><r><![CDATA[') + 
    ']]></r></root>' AS XML)) AS t1(c);