Search code examples
excelpowerquery

Power Query Find First and Last non blank cell in each column and reference the startdate column


I've created a formula in excel that finds the first and last non blank cell and returns the corresponding date found in the startdate column. I need to repeat this on a large database (csv file of more than 1Gb in size of which there are about 200K rows with 1216 columns) and so rather than loading it up in excel I would like to use Power Query to do the hard work and return a simple table which details the QID headers (in one column) and the oldest and newest date for where a value is found, for each column (QID). Doing it in PowerQuery will then mean I can easily point to different source files and not have to load up the whole file into excel in the first place which is extremely slow not forgetting the formula itself!

Below is a picture of my efforts in excel with the formula for cell E19 as :

=TEXT(index($D$4:$D$17,MIN(IF(len(E$4:E$17)>0,row($D$4:$D$17)-3,""))),"dd/mm/yyyy")&" to "&TEXT(index($D$4:$D$17,Max(IF(len(E$4:E$17)>0,row($D$4:$D$17)-3,""))),"dd/mm/yyyy")

I have imported the table into Power query and thought my first step would be to get the headers only and transpose them into a new query / table but then I'm not sure how I can create two new columns one for lowest and highest date and reference each column in the original table (let's call it DataTable) based on each QID name or column number. I also will need to run a find and replace as blank cells need to be replaced with null so that it finds the true first and last entry (that could be done in notepad++ if that would be a more efficient way).

Please could someone provide some guidance in the steps I need to do?

Thanks

Rodp

enter image description here

========================= update 13/3/24

Thanks to Ron and Kevin for the advice so far. Both examples get me a little closer but they are not picking up the first on blank cell. It seems to be picking up the first cell outright.

I'd like to provide an example of the data here but pasting it here may not copy over the data as it's seen in excel (we're talking about empty cells vs blank cells) but here goes anyway. I've converted it to csv.

Here's a smallscreen shot to help enter image description here

startDate,Attribute,Value
10/07/2022 06:19,QID270,
10/07/2022 08:06,QID270,
11/07/2022 11:57,QID270,
11/07/2022 11:59,QID270,
11/07/2022 12:08,QID270,
11/07/2022 12:18,QID270,Yes
11/07/2022 13:11,QID270,
11/07/2022 22:20,QID270,No
11/07/2022 22:34,QID270,
12/12/2022 10:15,QID270,
12/12/2022 10:16,QID270,
12/12/2022 10:16,QID270,No
12/12/2022 10:16,QID270,
13/12/2022 22:46,QID270,
13/12/2022 22:54,QID270,
13/12/2022 23:49,QID270,
13/12/2022 23:52,QID270,

The first nonblank cell should reuslt in a date: 11/07/2022 12:18 but at the moment powerquery returns 10/07/2022 06:19. Likewise the last entry should be 12/12/2022 but it's retuning 13/12/2022

I've tried two methods. I've made sure the empty cells in powerquery have null in them.

List First (returns 10/07/2022 06:19:04)

= Table.Group(Source, {"Attribute"}, {"Min Date", each List.First(List.RemoveNulls(List.RemoveFirstN(Source[startDate],0))), type nullable datetime})

List Min (returns 10/07/2022 to 13/12/2022)

= Table.Group(#"Changed Type", {"Attribute"}, {
    {"First/Last", (t)=> Date.ToText(List.Min(t[startDate]),"dd/MM/yyyy"" to """) & Date.ToText(List.Max(t[startDate]), "dd/MM/yyyy")
    }})

Thanks in advance Rodp


Solution

  • Here's one way to show what you show for your output in your screenshot:

    • Transpose the data
    • Add a column in which we calculate the first and last dates
      • Remove all the nulls each row.
      • Find the position of the first and last (non-null) entry
      • Return the corresponding Date from the first row (which has all the dates from the original table)
      • Create a formatted string as you show in your screenshot
    • Delete the first row and also all the columns except the first and last.

    The algorithm assumes that dates are in descending order as you show

    let
    
    //change next line to reflect actual data source
        Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
    
    //set datetime column to propert type
        #"Changed Type" = Table.TransformColumnTypes(Source,{"StartDate", type datetime}),
    
    //Demote headers and Transpose
        #"Demoted Headers" = Table.DemoteHeaders(#"Changed Type"),
        #"Transposed Table" = Table.Transpose(#"Demoted Headers"),
    
    //Add a column showing the first and last dates
    //output results as a string
        #"Add First/Last" = Table.AddColumn(#"Transposed Table", "First/Last", (r)=>
            let 
                li = List.RemoveFirstN(Record.ToList(r),1),
                first = List.RemoveNulls(li){0},
                last = List.RemoveNulls(List.Reverse(li)){0},
                #"Last Date" = Date.From(Record.ToList(#"Transposed Table"{0}) {List.PositionOf(li,first,Occurrence.First)+1}),
                #"First Date" = Date.From(Record.ToList(#"Transposed Table"{0}){List.PositionOf(li,last,Occurrence.Last)+1})
            in 
                Date.ToText(#"First Date","dd/MM/yyyy"" to """) & Date.ToText(#"Last Date", "dd/MM/yyyy"), type text),
    
    //Remove all except first and last columns
    //Remove first row
        result = Table.RemoveColumns(Table.RemoveFirstN(#"Add First/Last",1), List.RemoveLastN(List.RemoveFirstN(Table.ColumnNames(#"Add First/Last"),1),1))
    in
    
        result
    

    enter image description here

    Edit: Since you wrote the above was slow on your dataset, try the following where we use Unpivot and Table.Group to come up with the same result:

    Edit2: Ensure blanks are ignored

    let
    
    //change next line to reflect actual data source
        Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
    
    //set datetime column to date
        #"Changed Type" = Table.TransformColumnTypes(Source,{"StartDate", type date}),
        #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"StartDate"}, "Attribute", "Value"),
        #"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Attribute"}, {
            {"First/Last", (x)=> let t=Table.SelectRows(x, each [Value]<>null) 
                in 
                    Date.ToText(List.Min(t[StartDate]),"dd/MM/yyyy"" to """) & 
                    Date.ToText(List.Max(t[StartDate]), "dd/MM/yyyy"), type text
            }}),
        #"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Attribute", Order.Ascending}})
    in
        #"Sorted Rows"