Search code examples
stringazure-data-explorerkql

Kusto String Difference


I need help with finding difference between 2 strings. For example, difference between the strings outlook and outlooka needs to be "a" or even the number of characters that differ should work fine.

I am okay with converting the strings to array and calculating the set difference as well.

Any help is much appreciated. Thank you.

I am trying to identify homoglyph domains with minor changes.


Solution

  • This query counts each character occurrences in each string and returns the differences.

    datatable(id:int, str1:string, str2:string)
    [
        1   ,"outlook"  ,"outlooka"
       ,2   ,"outlook"  ,"outlok"
       ,3   ,"outlook"  ,"outllooook"
       ,4   ,"outlook"  ,"lookout"
    ] 
    | mv-apply c = extract_all("(.)", strcat(str1, str2)) to typeof(string)
              ,s = array_concat(repeat("1", strlen(str1)), repeat("2", strlen(str2))) to typeof(string) on
     (
          summarize count_diff = countif(s == 2) - countif(s == 1) by c
        | summarize char_diff = make_bag_if(bag_pack(c, count_diff), count_diff != 0)
     )
    
    id str1 str2 char_diff
    1 outlook outlooka {"a":1}
    2 outlook outlok {"o":-1}
    3 outlook outllooook {"o":2,"l":1}
    4 outlook lookout {}

    Fiddle