I have a current sheet that is pulling from multiple sources and has multiple rows and columns. Spreadsheet is similar to this.
File_date, Project#, Sugg_Start_date,....
I am attempting to add an additional column named First_Sugg_Start. The issue is that with every daily data dump, if the Sugg_Start_date was not started, that date adjusts to the current date of the data dump. I need to know the first date of the Sugg_Start_date for all projects. But, I only want to see the projects that show up on the most recent data dump.
Example
Date | Project# | Sugg_Start |
---|---|---|
01/01/2025 | 123456 | 01/01/2025 |
01/01/2025 | 123457 | 01/01/2025 |
01/01/2025 | 123458 | 01/01/2025 |
01/01/2025 | 123459 | 01/01/2025 |
01/02/2025 | 123457 | 01/02/2025 |
01/02/2025 | 123458 | 01/02/2025 |
01/02/2025 | 123459 | 01/02/2025 |
01/03/2025 | 123457 | 01/03/2025 |
01/03/2025 | 123458 | 01/03/2025 |
01/03/2025 | 123459 | 01/03/2025 |
The output I would like is
Date | Project# | Sugg_Start_Date | First_Sugg_Start |
---|---|---|---|
01/03/2025 | 123457 | 01/03/2025 | 01/01/2025 |
01/03/2025 | 123458 | 01/03/2025 | 01/01/2025 |
01/03/2025 | 123459 | 01/03/2025 | 01/01/2025 |
(I have additional columns I would not like to filter out)
The issue I am having is that if I select by earliest First_Sugg_Start, I cannot display the most recent data contained in File_date. And, when I do filter, I lose the remaining columns that I also need as part of the report (for example PM, description, item, etc....)
I have tried to make an additional table, index on the Project# and pull that data in, but it fails. I have since written a query.
Let
Source = Folder.Files("C:\6.0\Task List"),
#"Filter Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name, "Source.Name"]),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1". "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#Expanded Table Column1", {{"Source.Name", type text}, {"Date", type date}, {"Project#", type text}, {"Sugg_Start", type date}, {"Task#", type text}}),
#"Added Custom" = Table.AddColumn(#"ChangedType, "Project#_with_Task", each [Project#]&" "&[Task#]),
#"Grouped Rows" = Table.Group(#"Added Custom", {Project#_with_Task", "Sugg_Start}, {"First_Sugg_Start"}, each List.First([Sugg_Start]), type date),
#"Added Custom1" = Table.AddColumn(#Grouped Rows", [First_Sugg_Start])
in
#"Added Custom1"
Everything runs fine until I get to the last section:
{"First_Sugg_Start"}, each List.First([Sugg_Start]), type date),
#"Added Custom1" = Table.AddColumn(#Grouped Rows", [First_Sugg_Start])
in
#"Added Custom1"
Thanks.
Seems to me that your approach is rather too complicated for a fairly simple task:
let
//Replace with your own data source
Source = Excel.CurrentWorkbook(){[Name="Table2]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Sugg_Start", type date}}),
//Group by Project# to find the first Sugg_Start_date
Grouped = Table.Group(#"Changed Type", {"Project#"}, {{"First_Sugg_Start", each List.Min([Sugg_Start]), type date}}),
// Merge this result back with the original data
Merged = Table.NestedJoin(#"Changed Type", {"Project#"}, Grouped, {"Project#"}, "Grouped", JoinKind.LeftOuter),
Expanded = Table.ExpandTableColumn(Merged, "Grouped", {"First_Sugg_Start"}),
// Filter to keep only the most recent data dump
LatestDate = List.Max(#"Changed Type"[Date]),
Filtered = Table.SelectRows(Expanded, each [Date] = LatestDate),
#"Reordered Columns" = Table.ReorderColumns(Filtered,{"Date", "Project#", "Sugg_Start", "First_Sugg_Start", "Task#", "Name"})
in
#"Reordered Columns"