Search code examples
powerquery

PowerBI filter folder files via filename


I want to build a PowerBI report using many excel files from a folder. I want this report to be dynamic and be able to load all the files by itself.

Unfortunately, I cannot expect my folder to only contain the files I want. I can, however, expect my source filenames to match the following regex: /.*_.*_(\d{6})\.xlsx/. I cannot find a way to match the file names against the regex. The closest way I've found is to filter the files by several conditionals, like so:

= Table.SelectRows(Origen, each ([Extension] = ".xlsx") and not Text.StartsWith([Name], "~"))

However, this approach is too permissive, tedious and error prone. It may allow an unwanted file to be used as a data source.

The Problem

I want to use only the files whose name:

  • Starts with an arbitrary string, then has one underscore
  • Continues with an arbitrary string, then has a second underscore
  • Continues with a 6 digit string
  • ends with .xlsx

Again, I cannot expect the folder to only contain these types of files. I am able to filter out the unwqanted files using javascript and the above regex, but I do not know how to achieve this in PowerBI. I have looked at this question from microsoft fabric but I do not fully understand how to adapt it to my problem.


Solution

  • Assuming Column1 has the filenames to test

    #PriorStepName = 
    fRegex=(text,regex)=>Web.Page("<script>var x='"&text&"';var y=new RegExp('"&regex&"','g');var b=x.match(y);document.write(b);</script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0} ,
    #"Added Custom" = Table.AddColumn(#PriorStepName, "Output",  each try fRegex([Column1], ".*_.*_(\\d{6}).xlsx") otherwise null ),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Output] <> null and output <> "null")
    in #"Filtered Rows"
    

    enter image description here