I have a very long and complex string that comes in with New line breaks - I am having a hard time parsing. I need to be able to create a select query with a column for each of the below fields.
The ideal would be find the new line break
- for each line - go back to the :
everything before the colon should be the name of the column, and everything between :
and new ling break
should be the data in the field.
All the data is returned as strings so i'm just building a select statement for each of the below lines. I'm not sure if that is at all possible.
The second alternative, to hard code and say something like CHARINDEX ( 'Home Phone:' ,notes, 0)
Where I find the home phone string and then pull everything between the :
and the new ling break
after specifying the string.
In this case each select item in my query will say - find string "Home Phone" and pull what comes after the colon, or find string "School Name" etc.
This is what the data looks like (in one all string called notes
):
Home Phone: 1234567890
Cell Phone: 1234567890
Date of Birth: 01/01/1971
School Name: James Jones High School
Address:123 Main Street
School City: Queens
School State: PA
School Zip: 32112
Years Teaching: 12
Grade Levels: Middle School
Total Students: 120
Subject: Music:
How did they hear: Other, provide more info: Former partner teacher in the Middle School
Type: Public/Charter
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc
so the output looks something like this (with all of the long questions answered as well in each field).
Home Phone Cell Phone Date of Birth: … Type: Question 1 : Question 2: Question 3:
1234567890 1234567890 1/1/1971 Public/Charter aaaaaaaa aaaaaaaaaaaaa. bbb bbbbbbbbbb ccccccccccccccccccccccc
I'm not sure if that makes sense -- but any and all suggestions are really appreciated.
Code to pull the substring and the new line char -- but this is hard coded. I can't figure out how to do it dynamically.
SELECT ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining',
ltrim(rtrim(CHARINDEX ( CHAR(10) ,notes, 0))) as 'ending',
SUBSTRING(notes,(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',
FROM table a
Thank you!
A lot of this credit (90%) should go to Alex K who provided an indepth answer regarding finding the nth occurrence of a character
SQL Server - find nth occurrence in a string
I took that answer, adjusted it for your question and then applied a PIVOT to break it into the desired rows/columns. This method should be able to create the desired output for as many unique question sets as you need provided they always have the same logic (each question/answer separated by a line break).
--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
DROP TABLE #Results
CREATE TABLE #Results
(ID INT IDENTITY(1,1) NOT NULL,
Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
(Notes)
VALUES
('Home Phone: 1234567890
Cell Phone: 1234567890
Date of Birth: 01/01/1971
School Name: James Jones High School
Address:123 Main Street
School City: Queens
School State: PA
School Zip: 32112
Years Teaching: 12
Grade Levels: Middle School
Total Students: 120
Subject: Music:
How did they hear: Other, provide more info: Former partner teacher in the Middle School
Type: Public/Charter '),
('Home Phone: test
Cell Phone: test
Date of Birth: test
School Name: test
Address:test
School City: test
School State: test
School Zip: test
Years Teaching: test
Grade Levels: test
Total Students: test
Subject: test
How did they hear: test
Type: test ');
--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS
(SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
FROM #Results
UNION ALL
SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
FROM cte
WHERE
pos >0),
--2nd CTE breaks each question set into it's own row
cte2
AS
(SELECT ID, Notes,Starts, Pos,
SUBSTRING(Notes, Starts,
CASE
WHEN pos > 0 THEN (pos - starts)
ELSE LEN(notes)
END) AS Token
FROM cte),
--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
(SELECT ID,
LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question,
LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
FROM cte2)
--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
(SELECT ID, Question, Answer
FROM cte3) AS a
PIVOT
(MAX(Answer)
FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
[School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt
I put comments on each section to hopefully explain my logic but let me know if you have any questions.
EDIT: Dynamic Pivot
It is possible to use dynamic SQL to create a PIVOT that will automatically pick up on all the "Question" columns and adjust accordingly. I do not believe it can be done in one step since I had to use the multiple CTEs. What I would do is take the above steps used to create CTE, CTE2, and CTE3 (basically everything before the PIVOT query) and CREATE a VIEW of those steps, then with that view do the following (for my example the view is called "Questionaire")
DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)
SET @columns = STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
FROM questionaire AS q
FOR XML PATH(''), TYPE
).value('.','NVARCHAR(MAX)')
,1,1,'')
SET @query = 'SELECT ID, '+ @columns +' FROM
(
SELECT ID, Answer, Question
FROM questionaire
) AS a
PIVOT
(
MAX(Answer)
FOR Question IN(' +@columns+')
) AS p'
EXECUTE(@query)