Search code examples
sql-serverxmlt-sqlxqueryshred

Shredding XML in SQL but cross apply adds unwanted records


I have to flatten out this XML source into one SQL server table. There are multiple ResponseID per source file and multiple TextAnalyticsItem per ResponseID. I want to limit the records so that I only get the TextAnalyticsItems related to the ResponseID. However, the cross apply method gives me all ResponseId with all TextAnalyticsItem. How do I prevent the additional records?

DECLARE @XMLToParse XML;
SET @XMLToParse = '  
<Responses>
  <Response>
    <ResponseId>7662934</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234909</RespondentId>
    <QuestionId>141757</QuestionId>
    <ScaleId>3401</ScaleId>
    <AnswerId>17130</AnswerId>
    <ResponseMemo>Useful</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-06T09:07:40</CompletedDate>
    <ModifiedDate>2020-07-06T09:07:41</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Values Standards</Level1>
        <Level2>Better/Best/Brilliant</Level2>
        <Level3>Positive</Level3>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
      <TextAnalyticsItem>
        <Level1>All-Behaviors</Level1>
        <Level2>Positive_</Level2>
        <Sentiment>1</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
   <Response>
    <ResponseId>7662078</ResponseId>
    <SurveyId>123</SurveyId>
    <RespondentId>234826</RespondentId>
    <QuestionId>141756</QuestionId>
    <ScaleId>3400</ScaleId>
    <AnswerId>17129</AnswerId>
    <ResponseMemo>Ghjlkk</ResponseMemo>
    <ResponseRank>0</ResponseRank>
    <ResponseState>0</ResponseState>
    <CompletedDate>2020-07-03T07:17:31</CompletedDate>
    <ModifiedDate>2020-07-03T07:17:31</ModifiedDate>
    <LanguageId>220</LanguageId>
    <ResponseNum>0</ResponseNum>
    <ResponseDate />
    <TextAnalyticsData>
      <TextAnalyticsItem>
        <Level1>Nonactionable</Level1>
        <Sentiment>0</Sentiment>
      </TextAnalyticsItem>
    </TextAnalyticsData>
  </Response>
</Responses>'
 SELECT xmlData.A.value('ResponseId[1]', 'VARCHAR(100)') AS ResponseId, 
       xmlData.A.value('SurveyId[1]', 'VARCHAR(100)') AS SurveyId, 
       xmlData.A.value('RespondentId[1]', 'VARCHAR(100)') AS RespondentId, 
       xmlData.A.value('TextAnalyticsData[1]', 'VARCHAR(100)') AS TextAnalyticsData, 
       tbl1.TxtItems.value('Level1[1]', 'VARCHAR(100)') AS Level1, 
       tbl1.TxtItems.value('Level2[1]', 'VARCHAR(100)') AS Level2, 
       tbl1.TxtItems.value('Level3[1]', 'VARCHAR(100)') AS Level3, 
       tbl1.TxtItems.value('Sentiment[1]', 'VARCHAR(100)') AS Sentiment
FROM @XMLToParse.nodes('Responses/Response/TextAnalyticsData/TextAnalyticsItem') tbl1(TxtItems)
     cross apply @XMLToParse.nodes('Responses/Response') xmlData(A)

ORDER BY ResponseId, 
         RespondentId;

So instead of

ResponseId        SurveyId          RespondentId      TextAnalyticsData                                                     Level1            Level2            Level3     Sentiment
---------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------
7662078           123               234826            Nonactionable0                                                        Values Standards  Better/Best/BrilliPositive          1
7662078           123               234826            Nonactionable0                                                        All-Behaviors     Positive_         NULL              1
7662078           123               234826            Nonactionable0                                                        Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Values Standards  Better/Best/BrilliPositive          1
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 All-Behaviors     Positive_         NULL              1

I want to get

ResponseId        SurveyId          RespondentId      TextAnalyticsData                                                     Level1            Level2            Level3     Sentiment
---------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------
7662078           123               234826            Nonactionable0                                                        Nonactionable     NULL              NULL              0
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 Values Standards  Better/Best/BrilliPositive          1
7662934           123               234909            Values StandardsBetter/Best/BrilliantPositive1All-BehaviorsPositive_1 All-Behaviors     Positive_         NULL              1

Any suggestions? Thanks


Solution

  • Check it out how to achieve what you need. It simulates one-to-many relationship.

    I am not sure what was the intent for the TextAnalyticsData column. That's why I commented it out.

    SQL

    DECLARE @XMLToParse XML = 
    N'<Responses>
      <Response>
        <ResponseId>7662934</ResponseId>
        <SurveyId>123</SurveyId>
        <RespondentId>234909</RespondentId>
        <QuestionId>141757</QuestionId>
        <ScaleId>3401</ScaleId>
        <AnswerId>17130</AnswerId>
        <ResponseMemo>Useful</ResponseMemo>
        <ResponseRank>0</ResponseRank>
        <ResponseState>0</ResponseState>
        <CompletedDate>2020-07-06T09:07:40</CompletedDate>
        <ModifiedDate>2020-07-06T09:07:41</ModifiedDate>
        <LanguageId>220</LanguageId>
        <ResponseNum>0</ResponseNum>
        <ResponseDate />
        <TextAnalyticsData>
          <TextAnalyticsItem>
            <Level1>Values Standards</Level1>
            <Level2>Better/Best/Brilliant</Level2>
            <Level3>Positive</Level3>
            <Sentiment>1</Sentiment>
          </TextAnalyticsItem>
          <TextAnalyticsItem>
            <Level1>All-Behaviors</Level1>
            <Level2>Positive_</Level2>
            <Sentiment>1</Sentiment>
          </TextAnalyticsItem>
        </TextAnalyticsData>
      </Response>
       <Response>
        <ResponseId>7662078</ResponseId>
        <SurveyId>123</SurveyId>
        <RespondentId>234826</RespondentId>
        <QuestionId>141756</QuestionId>
        <ScaleId>3400</ScaleId>
        <AnswerId>17129</AnswerId>
        <ResponseMemo>Ghjlkk</ResponseMemo>
        <ResponseRank>0</ResponseRank>
        <ResponseState>0</ResponseState>
        <CompletedDate>2020-07-03T07:17:31</CompletedDate>
        <ModifiedDate>2020-07-03T07:17:31</ModifiedDate>
        <LanguageId>220</LanguageId>
        <ResponseNum>0</ResponseNum>
        <ResponseDate />
        <TextAnalyticsData>
          <TextAnalyticsItem>
            <Level1>Nonactionable</Level1>
            <Sentiment>0</Sentiment>
          </TextAnalyticsItem>
        </TextAnalyticsData>
      </Response>
    </Responses>';
    
     SELECT r.value('(ResponseId/text())[1]', 'VARCHAR(100)') AS ResponseId
        , r.value('(SurveyId/text())[1]', 'VARCHAR(100)') AS SurveyId
        , r.value('(RespondentId/text())[1]', 'VARCHAR(100)') AS RespondentId
           --xmlData.A.value('TextAnalyticsData[1]', 'VARCHAR(100)') AS TextAnalyticsData, 
        , a.value('(Level1/text())[1]', 'VARCHAR(100)') AS Level1
        , a.value('(Level2/text())[1]', 'VARCHAR(100)') AS Level2
        , a.value('(Level3/text())[1]', 'VARCHAR(100)') AS Level3
        , a.value('(Sentiment/text())[1]', 'VARCHAR(100)') AS Sentiment
    FROM @XMLToParse.nodes('/Responses/Response') t1(r)
         CROSS APPLY t1.r.nodes('TextAnalyticsData/TextAnalyticsItem[Level1/text()]') t2(a)
    --ORDER BY ResponseId, 
    --         RespondentId;
    

    Output

    +------------+----------+--------------+------------------+-----------------------+----------+-----------+
    | ResponseId | SurveyId | RespondentId |      Level1      |        Level2         |  Level3  | Sentiment |
    +------------+----------+--------------+------------------+-----------------------+----------+-----------+
    |    7662934 |      123 |       234909 | Values Standards | Better/Best/Brilliant | Positive |         1 |
    |    7662934 |      123 |       234909 | All-Behaviors    | Positive_             | NULL     |         1 |
    |    7662078 |      123 |       234826 | Nonactionable    | NULL                  | NULL     |         0 |
    +------------+----------+--------------+------------------+-----------------------+----------+-----------+