Search code examples
powerbidaxpowerquerym

Replace or substitute using DAX or Query Editor - Power BI


I have a requirement where I have to loop through each character in a text field

For example:

Input Expected Output
a#123456; 12341 123456;12341
a123456 12341bd 123456;12341
a2017d 12341ds 12341
a123456/12341bd 123456;12341
n/a null

Basically I am cleaning my data here:

  1. I will remove any charachter that is not a number.
  2. A number is only valid if there are a minimum of 5 consecutive numbers.
  3. In the Input to differentiate the ID's a space or ";" or ',' or '/' could have been used. Multiple ID's will be present in some cases and not restricted to two.
  4. To keep it the same, I will replace all these connectors by ";"

Note:- My steps may not be right, but my expected output is what I am looking to get from the data.

How can we do this using DAX or Query Editor in Power BI?


Solution

  • I'll attempt this in two steps.

    First, let's remove anything that isn't a number or a separator. We can create a custom column to do this with this formula (assuming your starting column is named Raw):

    Text.Remove([Raw], List.Transform({33..43,60..255}, each Character.FromNumber(_)))
    

    This uses the Windows-1252 character numbering, but you could just write a big list of characters to remove using the Text.Remove function.

    This custom column now looks like this:

    Custom
    123456; 12341
    123456 12341
    2017  12341
    123456/12341
    /
    

    From here, the plan is to split it using each separator and filter out strings that are too short.

    The Text.SplitAny function allows us to specify multiple separators. For example, the formula Text.SplitAny("1;23 4.5-6/ 7:8,9", " ,-./:;") would return {"1","23","4","5","6","","7","8","9"}.

    Once we have a list of substrings, we can filter the ones we don't want with List.Select and then concatenate them from a list of substrings into a single string using Text.Combine.

    Putting all this together, we have this formula

    Text.Combine(List.Select(Text.SplitAny([Custom], " ,-./:;"), each Text.Length(_) > 4), ";")
    

    You can smash it all into one step if you prefer:

    = Table.AddColumn(#"Previous Step", "Clean",
          each Text.Combine(
                   List.Select(
                       Text.SplitAny(
                           Text.Remove(
                               [Raw],
                               List.Transform(
                                   {33..43,60..255},
                                   each Character.FromNumber(_)
                               )
                           ),
                           " ,-./:;"
                       ),
                       each Text.Length(_) > 4),
                   ";"
               )
           )