sql-server parsing select find charindex

SQL Parse out multiple substrings

I have a very long and complex string that comes in with New line breaks - I am having a hard time parsing. I need to be able to create a select query with a column for each of the below fields.

The ideal would be find the new line break - for each line - go back to the : everything before the colon should be the name of the column, and everything between : and new ling break should be the data in the field.

All the data is returned as strings so i'm just building a select statement for each of the below lines. I'm not sure if that is at all possible.

The second alternative, to hard code and say something like CHARINDEX ( 'Home Phone:' ,notes, 0) Where I find the home phone string and then pull everything between the : and the new ling break after specifying the string.

In this case each select item in my query will say - find string "Home Phone" and pull what comes after the colon, or find string "School Name" etc.

This is what the data looks like (in one all string called notes):

Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc

so the output looks something like this (with all of the long questions answered as well in each field).

Home Phone  Cell Phone  Date of Birth:  …   Type:               Question 1 :                Question 2:    Question 3: 
1234567890  1234567890  1/1/1971            Public/Charter      aaaaaaaa aaaaaaaaaaaaa.     bbb bbbbbbbbbb ccccccccccccccccccccccc

I'm not sure if that makes sense -- but any and all suggestions are really appreciated.

Code to pull the substring and the new line char -- but this is hard coded. I can't figure out how to do it dynamically.

SELECT  ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining',
        ltrim(rtrim(CHARINDEX ( CHAR(10) ,notes, 0)))   as 'ending',
        SUBSTRING(notes,(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',    
FROM    table a

Thank you!

Solution

A lot of this credit (90%) should go to Alex K who provided an indepth answer regarding finding the nth occurrence of a character

SQL Server - find nth occurrence in a string

I took that answer, adjusted it for your question and then applied a PIVOT to break it into the desired rows/columns. This method should be able to create the desired output for as many unique question sets as you need provided they always have the same logic (each question/answer separated by a line break).

--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results

CREATE TABLE #Results 
    (ID INT IDENTITY(1,1) NOT NULL,
    Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
    (Notes)
VALUES
    ('Home Phone: 1234567890  
    Cell Phone: 1234567890  
    Date of Birth: 01/01/1971 
    School Name: James Jones High  School 
    Address:123 Main Street 
    School City: Queens  
    School State: PA  
    School Zip: 32112 
    Years Teaching: 12  
    Grade Levels: Middle School  
    Total Students: 120  
    Subject: Music:   
    How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
    Type: Public/Charter '),
    ('Home Phone: test  
    Cell Phone: test 
    Date of Birth: test
    School Name: test
    Address:test 
    School City: test 
    School State: test  
    School Zip: test 
    Years Teaching: test 
    Grade Levels: test 
    Total Students: test
    Subject: test   
    How did they hear:  test 
    Type: test ');

--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS

    (SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
    FROM #Results
    UNION ALL
    SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
    FROM cte
    WHERE
        pos >0),

--2nd CTE breaks each question set into it's own row
cte2
AS
    (SELECT ID, Notes,Starts, Pos,
        SUBSTRING(Notes, Starts,
            CASE
                WHEN pos > 0 THEN (pos - starts)
                ELSE LEN(notes)
            END) AS Token
    FROM cte),

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
    (SELECT ID, 
        LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question, 
        LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
    FROM cte2)

--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
    (SELECT ID, Question, Answer
    FROM cte3) AS a
PIVOT
    (MAX(Answer)
    FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
                        [School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt

I put comments on each section to hopefully explain my logic but let me know if you have any questions.

EDIT: Dynamic Pivot

It is possible to use dynamic SQL to create a PIVOT that will automatically pick up on all the "Question" columns and adjust accordingly. I do not believe it can be done in one step since I had to use the multiple CTEs. What I would do is take the above steps used to create CTE, CTE2, and CTE3 (basically everything before the PIVOT query) and CREATE a VIEW of those steps, then with that view do the following (for my example the view is called "Questionaire")

DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)

SET @columns =  STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
        FROM questionaire AS q
        FOR XML PATH(''), TYPE
        ).value('.','NVARCHAR(MAX)')
        ,1,1,'')

SET @query =    'SELECT ID, '+ @columns +' FROM
        (
            SELECT ID, Answer, Question
            FROM questionaire
        ) AS a
        PIVOT
        (
            MAX(Answer)
            FOR Question IN(' +@columns+')
        ) AS p'
EXECUTE(@query)