Search code examples
azurereporting-serviceskql

Kql Query Not Successful


I am trying to generate a report on the usage of Sensitivity Labels based on Department,Users,and certain operations mentioned in the query.

I am trying to combine 2 tables.

The first one is MicrosoftPurviewInformationProtection which has information about the Sensitivity Label, the Operations, the Workload, and the UserId.While the second table is IdentityInfo, which has the User info such as AccountUPN ,Mail Address and Department name.

The issue that occurs is that on some cases 'MicrosoftPurviewInformationProtection'table's UserId column has the mailadress, for example, abc@xyz.com and for some cases, it has the UPN (Employee number), 123@xyz.com.

So I am trying to join both of these tables to get all of the data together, so that I can get the desired result.

I am trying to join the 'MicrosoftPurviewInformationProtection' table first with entries of AccountUPN from 'IdentityInfo' table, then trying to join 'MicrosoftPurviewInformationProtection' table with 'MailAddress' from 'IdentityInfo' table, and then combining them with a union.

My Query looks like this:

MicrosoftPurviewInformationProtection
| join kind=leftouter  IdentityInfo on ($left.UserId==$right.AccountUPN)
| union (MicrosoftPurviewInformationProtection | join kind = leftouter  IdentityInfo on ($left.UserId==$right.MailAddress))
| where Operation in~ ("SensitivityLabelApplied",
    "FileSensitivityLabelApplied",
    "FileSensitivityLabelChanged",
    "SensitivityLabelUpdated",
    "SiteSensitivityLabelApplied",
    "SiteSensitivityLabelChanged")
| extend ProcessName= tostring(Common.ProcessName)
| extend Apps = strcat(Application,ProcessName)
| summarize count() by Operation, SensitivityLabelId, Department,Apps,Workload,AccountUPN,MailAddress

While I run this and see the required fields, I do not know if I can trust this data, primarily because of 2 reasons.

  1. I have never used union and join, I do not know if this is accurately doing what I am trying to do.
  2. A lot of fields(AccountUPN/MailAddress) have no entry for a lot of rows.

Question 1: Can you please help me understand if what I am doing is correct and if it is, why am I seeing blanks.

Question 2: Also, IdentityInfo is not a static table, it has multiple entries and keeps updating with multiple entries of the same user over a period of time. If there is a table out there which has Information about a User's UPN,MailAddress and Department and only updates when a user is created, it will help me a lot. Please do let me know.

EDIT : FIRST TABLE CAN ALSO HAVE SAME VALUES FOR MULTIPLE ROWS,BUT THERE IS A UNIQUE ID COLUMN THAT SEPARATES THE DATA. I NEED ALL THE DATA FROM THE FIRST TABLE.


Solution

  • You are using a left outer join to join the MicrosoftPurviewInformationProtection table with the IdentityInfo table twice, once on the AccountUPN column and once on the MailAddress column. You are then using a union to combine the results of the two joins. When joining the tables for the first time on matching AccountUPN column and UserId column, only all the columns from the left table(MicrosoftPurviewInformationProtection) will have data and only matching values from the right table will have values. All other non matching rows will have null(blank values). This will happen for the second join also. This is the reason for lot of fields have empty rows. This will also result in duplicate rows for same row in left table. To avoid this, you can take only the non-null row for each SensitivityLabelId after doing the union of both left outer join result. Below is the code,

    MicrosoftPurviewInformationProtection
    | join kind=leftouter IdentityInfo on (($left.UserId==$right.AccountUPN))
    | union (MicrosoftPurviewInformationProtection | join kind = leftouter IdentityInfo on ($left.UserId==$right.MailAddress))
    | sort by SensitivityLabelId asc , Department desc
    | extend Rank= row_number(1, prev(SensitivityLabelId)!=SensitivityLabelId)
    | where Rank == 1
    //add remaining codes
    

    This code will filter only one row for each SensitivityLabelId field. Even when multiple entries for same row is there in IdentityInfo table, the above code will remove the duplicates in the result set.

    demo

    demo2