Search code examples
regexazure-data-explorerkql

Azure Kusto - how to fetch urls from a string using parse


let T = datatable(Id:int, Text:string)
[
   1, "SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844",
   2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
];
T | parse Text with * "someurl.com" myurl ">" * | project Id, myurl 

Output
=========
Id  myurl
1   
2   /fileId/newedit/12702480

Need a way to parse Text field and extract url from it. Content of the text field is html body. Using parse works, if there are more characters after the url Id like ">" or blank space, however if the Text field ends with the url id, it doesn't work. Url Id is not fixed length. If not parse, is there any other way to extract everything from someurl.com until the id, irrespective of whether the url is in the middle of the string or at the end?


Solution

  • let T = datatable(Id:int, Text:string)
    [
       1, "SomeTextSome TextSomeText:&nbsp;https://someurl.com/fileId/edit/12649844",
       2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
    ];
    T | extend URL=extract('&nbsp;.?(https://[a-zA-Z0-9/.]+)', 1, Text)
    

    +1 to Yoni L.'s answer above. The above will include the https.... not sure if this is what you are looking for.

    Output

    URL
    1   https://someurl.com/fileId/edit/12649844
    2   https://someurl.com/fileId/newedit/71244