Search code examples
sql-servert-sqlsql-server-2012sqlxml

Querying (a lot of) XML data for specific text with in xml Elements (Tsql)


Sorry: this post got a little long (wanted to make sure everything relevant was in)

Struggling to wrap my brain around this a little. I have a table that has 5 columns

ID (varchar)
Record value(XML)
date (datetime)
applicationtypeid (int)
applicationstatusid(int)

XML contains alot of data but part i'm interested in looks like this.

<GoodsAndServicesItemSummaryViewModelItems>
  <GoodsAndServicesMasterViewModel>
    <Id>1</Id>
    <ClassId>1</ClassId>
    <TermsText>text</TermsText>
    <TermsCreationType>ManuallyEntered</TermsCreationType>
    <Terms />

Specifically the "TermsCreationType" element. In it can be one of the following 3 strings:

  • ManuallyEntered
  • CopyFromExistingMarks
  • CopyFromPreapprovedTermsDatabase

Depending on what the customer files there could also be a mixture of all three and they could contain as many of these mixtures as they theoretically wanted i.e.:

 </PriorityClaimAdditionalDetailWizardStepViewModel>
  <GoodsAndServicesMasterViewModel>
    <Id>9</Id>
    <ClassId>2</ClassId>
    <TermsText>texty</TermsText>
    <TermsCreationType>ManuallyEntered</TermsCreationType>
    <Terms />
  </GoodsAndServicesMasterViewModel>
  <GoodsAndServicesMasterViewModel>
    <Id>10</Id>
    <ClassId>1</ClassId>
    <TermsText>text</TermsText>
    <TermsCreationType>ManuallyEntered</TermsCreationType>
    <Terms />
  </GoodsAndServicesMasterViewModel>
</GoodsAndServicesItemSummaryViewModelItems>
<GoodsAndServicesItemSummaryViewModelItemsUnMerged>
  <GoodsAndServicesMasterViewModel>
    <Id>9</Id>
    <ClassId>9</ClassId>
    <TermsText>test</TermsText>
    <TermsCreationType>CopyFromExistingMarks</TermsCreationType>
    <Terms />
  </GoodsAndServicesMasterViewModel>

I'm trying to find a count of how many records from a certain date range that

  1. ONLY contain one of the aforementioned (whether it appears once or even 50 times as long as that's the only value in that element)

  2. Records that contain any mixture of the three.


My attempt so far, for the "one element only" is thus:

SELECT Count (id) [pre-approved only]
FROM   [TMWebForms].[dbo].[webformapplication]
WHERE  trademarkid NOT IN (SELECT id
                           FROM   [TMWebForms].[dbo].[webformapplication]
                           WHERE  applicationtypeid = '5'
                                  AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
                                  AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "ManuallyEntered")', 'bit') = 1
                                  AND applicationstatusid = 50
                                  AND applicationtypeid = 5)
       AND id NOT IN (SELECT id
                      FROM   [TMWebForms].[dbo].[webformapplication]
                      WHERE  applicationtypeid = '5'
                             AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
                             AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "CopyFromExistingMark")', 'bit') = 1
                             AND applicationstatusid = 50
                             AND applicationtypeid = 5)
       AND createddate BETWEEN '2016-08-01' AND '2016-08-31'
       AND RECORDDATA.value('contains((//GoodsAndServicesWizardStepViewModel/GoodsAndServicesItemSummaryViewModelItems/GoodsAndServicesMasterViewModel/TermsCreationType/text())[1], "CopyFromPreapprovedTermsDatabase")', 'bit') = 1
       AND applicationstatusid = 50
       AND applicationtypeid = 5 
  1. This is long winded and not the best performance wise by a long shot! (i thought about converting it to a string then using "like" - but that is probably just as bad, if not worse)

  2. It doesn't pull exactly what i wanted back. To me it is searching only for "CopyFromPreapprovedTermsDatabase", but when interrogating the data, There are a few cases where this is the first line but, i can also see "manuallyentered" exists.

Any help appreciated here!


Solution

  • Try it like this

    (I had to add a root and add some opening and closing tags)

    DECLARE @xml XML=
    N'<SomeRoot>
      <GoodsAndServicesItemSummaryViewModelItems>
        <GoodsAndServicesMasterViewModel>
          <Id>9</Id>
          <ClassId>2</ClassId>
          <TermsText>texty</TermsText>
          <TermsCreationType>ManuallyEntered</TermsCreationType>
          <Terms />
        </GoodsAndServicesMasterViewModel>
        <GoodsAndServicesMasterViewModel>
          <Id>10</Id>
          <ClassId>1</ClassId>
          <TermsText>text</TermsText>
          <TermsCreationType>ManuallyEntered</TermsCreationType>
          <Terms />
        </GoodsAndServicesMasterViewModel>
      </GoodsAndServicesItemSummaryViewModelItems>
      <GoodsAndServicesItemSummaryViewModelItemsUnMerged>
        <GoodsAndServicesMasterViewModel>
          <Id>9</Id>
          <ClassId>9</ClassId>
          <TermsText>test</TermsText>
          <TermsCreationType>CopyFromExistingMarks</TermsCreationType>
          <Terms />
        </GoodsAndServicesMasterViewModel>
      </GoodsAndServicesItemSummaryViewModelItemsUnMerged>
    </SomeRoot>';
    
    DECLARE @tbl TABLE(ID VARCHAR(10),Record_Value XML,[date] DATETIME,applicationtypeid INT,applicationstautsid INT);
    INSERT INTO @tbl VALUES('SomeTest',@xml,GETDATE(),11,22);
    

    --The CTE will select all columns and add the actual count of the TermsCreationType with the given text()

    WITH CTE AS
    (
        SELECT *
              ,Record_Value.value('count(//TermsCreationType[text()="ManuallyEntered"])','int') AS CountManuallyEntered
              ,Record_Value.value('count(//TermsCreationType[text()="CopyFromExistingMarks"])','int') AS CountCopyFromExistingMarks
              ,Record_Value.value('count(//TermsCreationType[text()="CopyFromPreapprovedTermsDatabase"])','int') AS CountCopyFromPreapprovedTermsDatabase
        FROM @tbl
    )
    

    --The final SELECT uses as big CASE WHEN hierarchy to analyse the counts

    SELECT *
          ,CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'None' 
           ELSE
               CASE WHEN CTE.CountManuallyEntered>0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'Manually' 
               ELSE
                   CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks>0 AND CTE.CountCopyFromPreapprovedTermsDatabase=0 THEN 'Existing' 
                   ELSE
                       CASE WHEN CTE.CountManuallyEntered=0 AND CTE.CountCopyFromExistingMarks=0 AND CTE.CountCopyFromPreapprovedTermsDatabase>0 THEN 'Preapproved' 
                       ELSE
                       'Mixed'
                       END
                   END
               END
           END AS CountAnalysis
    FROM CTE;