I've created a formula in excel that finds the first and last non blank cell and returns the corresponding date found in the startdate column. I need to repeat this on a large database (csv file of more than 1Gb in size of which there are about 200K rows with 1216 columns) and so rather than loading it up in excel I would like to use Power Query to do the hard work and return a simple table which details the QID headers (in one column) and the oldest and newest date for where a value is found, for each column (QID). Doing it in PowerQuery will then mean I can easily point to different source files and not have to load up the whole file into excel in the first place which is extremely slow not forgetting the formula itself!
Below is a picture of my efforts in excel with the formula for cell E19 as :
=TEXT(index($D$4:$D$17,MIN(IF(len(E$4:E$17)>0,row($D$4:$D$17)-3,""))),"dd/mm/yyyy")&" to "&TEXT(index($D$4:$D$17,Max(IF(len(E$4:E$17)>0,row($D$4:$D$17)-3,""))),"dd/mm/yyyy")
I have imported the table into Power query and thought my first step would be to get the headers only and transpose them into a new query / table but then I'm not sure how I can create two new columns one for lowest and highest date and reference each column in the original table (let's call it DataTable) based on each QID name or column number. I also will need to run a find and replace as blank cells need to be replaced with null so that it finds the true first and last entry (that could be done in notepad++ if that would be a more efficient way).
Please could someone provide some guidance in the steps I need to do?
Thanks
Rodp
========================= update 13/3/24
Thanks to Ron and Kevin for the advice so far. Both examples get me a little closer but they are not picking up the first on blank cell. It seems to be picking up the first cell outright.
I'd like to provide an example of the data here but pasting it here may not copy over the data as it's seen in excel (we're talking about empty cells vs blank cells) but here goes anyway. I've converted it to csv.
Here's a smallscreen shot to help
startDate,Attribute,Value
10/07/2022 06:19,QID270,
10/07/2022 08:06,QID270,
11/07/2022 11:57,QID270,
11/07/2022 11:59,QID270,
11/07/2022 12:08,QID270,
11/07/2022 12:18,QID270,Yes
11/07/2022 13:11,QID270,
11/07/2022 22:20,QID270,No
11/07/2022 22:34,QID270,
12/12/2022 10:15,QID270,
12/12/2022 10:16,QID270,
12/12/2022 10:16,QID270,No
12/12/2022 10:16,QID270,
13/12/2022 22:46,QID270,
13/12/2022 22:54,QID270,
13/12/2022 23:49,QID270,
13/12/2022 23:52,QID270,
The first nonblank cell should reuslt in a date: 11/07/2022 12:18 but at the moment powerquery returns 10/07/2022 06:19. Likewise the last entry should be 12/12/2022 but it's retuning 13/12/2022
I've tried two methods. I've made sure the empty cells in powerquery have null in them.
List First (returns 10/07/2022 06:19:04)
= Table.Group(Source, {"Attribute"}, {"Min Date", each List.First(List.RemoveNulls(List.RemoveFirstN(Source[startDate],0))), type nullable datetime})
List Min (returns 10/07/2022 to 13/12/2022)
= Table.Group(#"Changed Type", {"Attribute"}, {
{"First/Last", (t)=> Date.ToText(List.Min(t[startDate]),"dd/MM/yyyy"" to """) & Date.ToText(List.Max(t[startDate]), "dd/MM/yyyy")
}})
Thanks in advance Rodp
Here's one way to show what you show for your output in your screenshot:
The algorithm assumes that dates are in descending order as you show
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
//set datetime column to propert type
#"Changed Type" = Table.TransformColumnTypes(Source,{"StartDate", type datetime}),
//Demote headers and Transpose
#"Demoted Headers" = Table.DemoteHeaders(#"Changed Type"),
#"Transposed Table" = Table.Transpose(#"Demoted Headers"),
//Add a column showing the first and last dates
//output results as a string
#"Add First/Last" = Table.AddColumn(#"Transposed Table", "First/Last", (r)=>
let
li = List.RemoveFirstN(Record.ToList(r),1),
first = List.RemoveNulls(li){0},
last = List.RemoveNulls(List.Reverse(li)){0},
#"Last Date" = Date.From(Record.ToList(#"Transposed Table"{0}) {List.PositionOf(li,first,Occurrence.First)+1}),
#"First Date" = Date.From(Record.ToList(#"Transposed Table"{0}){List.PositionOf(li,last,Occurrence.Last)+1})
in
Date.ToText(#"First Date","dd/MM/yyyy"" to """) & Date.ToText(#"Last Date", "dd/MM/yyyy"), type text),
//Remove all except first and last columns
//Remove first row
result = Table.RemoveColumns(Table.RemoveFirstN(#"Add First/Last",1), List.RemoveLastN(List.RemoveFirstN(Table.ColumnNames(#"Add First/Last"),1),1))
in
result
Edit: Since you wrote the above was slow on your dataset, try the following where we use Unpivot and Table.Group to come up with the same result:
Edit2: Ensure blanks are ignored
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
//set datetime column to date
#"Changed Type" = Table.TransformColumnTypes(Source,{"StartDate", type date}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"StartDate"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Attribute"}, {
{"First/Last", (x)=> let t=Table.SelectRows(x, each [Value]<>null)
in
Date.ToText(List.Min(t[StartDate]),"dd/MM/yyyy"" to """) &
Date.ToText(List.Max(t[StartDate]), "dd/MM/yyyy"), type text
}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Attribute", Order.Ascending}})
in
#"Sorted Rows"