Search code examples
regexazureazure-data-lakeu-sql

Azure / U-SQL - Regex substitution


I have some data containing spaces () and hyphens (-) that I'd like to convert to underscore characters (_). In other languages (ie R) I can write something like this:

var1 <- gsub(var1, "s+|\\-", "_")

which will look for multiple characters and convert them all to some other character.

Is there a way to do this in U-SQL?

EDIT:

I tried this and it ran without error but did not change the data:

@t2 = SELECT var1,
           var2,
           var3.Replace("s+|\\'|\\-","_") AS var3          
    FROM @t1;

Solution

  • You are almost there but you are using the System.String.Replace instead of the one for a regular expression. So change

    @t2 = SELECT var1,
           var2,
           var3.Replace("s+|\\'|\\-","_") AS var3          
    FROM @t1;
    

    to

    @t2 = SELECT var1,
           var2,
           Regex.Replace(var3, "s+|\\'|\\-", "_") AS var3          
    FROM @t1;
    

    Edit: I am not an expert on regular expression so I did not validate the expression itself.