Search code examples
powerbipowerquerym

Power Query - Remove the first and last rows based on the MIN and MAX value of a column within a grouped value


I have item sales data from a number of regions. For each region (North, West etc), there can be multiple sales items for each day. I would like to remove rows for the first (MIN) and last (MAX) of the DATE column for each REGION. In the dataset below, the 2 rows for the 'Fri 09 Dec 22 and the 2 rows for 'Mon 12 Dec 22' would be removed for the 'North' region. All rows between the MIN and MAX of the date column for that region should be kept. Again for the next region 'South', the rows for 'Tue 06 Dec 22' and 'Mon 12 Dec 22' should be removed and all rows between those 2 dates kept etc....

Region Date Product Amount
North Fri 09 Dec 22 Potato $75
North Fri 09 Dec 22 Potato $59
North Sat 10 Dec 22 Potato $131
North Sat 10 Dec 22 Apple $14
North Sun 11 Dec 22 Tomato $88
North Sun 11 Dec 22 Tomato $48
North Mon 12 Dec 22 Carrot $26
North Mon 12 Dec 22 Carrot $13
South Tue 06 Dec 22 Banana $69
South Tue 06 Dec 22 Banana $18
South Tue 06 Dec 22 Tomato $11
South Thu 08 Dec 22 Pineapple $8
South Thu 08 Dec 22 Pineapple $18
South Thu 08 Dec 22 Apple $115
South Sun 11 Dec 22 Pineapple $32
South Sun 11 Dec 22 Pineapple $29
South Mon 12 Dec 22 Carrot $40
West Wed 07 Dec 22 Plum $89
West Wed 07 Dec 22 Plum $144
West Wed 07 Dec 22 Apple $19
West Thu 08 Dec 22 Orange $33
West Thu 08 Dec 22 Orange $109
West Fri 09 Dec 22 Orange $61
West Fri 09 Dec 22 Orange $14
West Fri 09 Dec 22 Orange $89
West Fri 09 Dec 22 Apple $103
West Fri 09 Dec 22 Orange $2
West Fri 09 Dec 22 Orange $87
West Sat 10 Dec 22 Potato $106
West Sat 10 Dec 22 Apple $1
West Sun 11 Dec 22 Tomato $75
West Sun 11 Dec 22 Tomato $19
West Sun 11 Dec 22 Tomato $114
West Sun 11 Dec 22 Apple $61
West Mon 12 Dec 22 Apple $37
West Mon 12 Dec 22 Carrot $15
West Mon 12 Dec 22 Carrot $16

I'm able to remove the MIN and MAX values for the whole date range, but not for each REGION. If I create a grouping and flag the MIN and MAX for each region, I can get the right rows but the Product and Amount columns become lists - which if expanded to values create many duplicate rows:

let
Source = Excel.Workbook(File.Contents("C:\Temp\Test Data\2022 Sales.xlsx"), null, true),
Test_Sheet = Source{[Item="Test",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Test_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Region", type text}, {"Date", type date}, {"Time", type datetime}, {"Product", type text}, {"Amount", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Region"}, {
    {"Date", each [Date]},
    {"Product", each [Product]},
    {"Amount", each [Amount]},
    {"DontKeepMinFlag", each Table.First(_)[Date]},
    {"DontKeepMaxFlag", each Table.Last(_)[Date]}}),
#"Expanded Date" = Table.ExpandListColumn(#"Grouped Rows", "Date"),
#"Added Conditional Column" = Table.AddColumn(#"Expanded Date", "Custom", each if [DontKeepMinFlag] = [Date] then true else if [DontKeepMaxFlag] = [Date] then true else false),
#"Filtered Rows" = Table.SelectRows(#"Added Conditional Column", each ([Custom] = false)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"DontKeepMinFlag", "DontKeepMaxFlag", "Custom"})
in
    #"Removed Columns"

Solution

  • Thank you for providing a usable text example.

    You can do all the selecting in your Table.Group Aggregations, and then just expand the column of tables (except for the Region Column).

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdTJbsMgEAbgV0FWjzkwgG187KLeukiJlEOUA2pREikxEbXfv5imwYMaOvLFSJ8G5mfZbKpX54d9taie/YHxjj3ZDyZEGL+7wQwu/Ny1dbVdUGDdIbg0AwP+FwQJRXl/Ph9thAq7sWcAya3c6VJRayJUGL64AEWCj8Z7N0xQNEQIMsKlGyNcjZbxJsEH04dvgk1HhKCLMDUDgOF+ZFzPAj/01vxGqek0nz+zaXegRjBPHRWVgm4FTup29opHuLZf03BtPxlvZzWP4ym23lEUKFViqWtULQ/nzZt+99OwpDngqGB+vRJsgOZA0Zwuzpv65ZJWTxCnbeeu8EjwpgTT+pC6efEvr9i/Dm9vweGYc3hdHt61/ChfmWxLbPba1EQXwtt+Aw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Region = _t, Date = _t, Product = _t, Amount = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Region", type text}, {"Date", type date}, {"Product", type text}, {"Amount", Currency.Type}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"Region"}, {
            {"Trimmed", (t)=>Table.SelectRows(t, 
                each [Date] <> List.Min(t[Date]) and [Date] <> List.Max(t[Date])), 
                type table[Region=nullable text, Date=nullable date, Product=nullable text, Amount=nullable number]}}),
        #"Expanded Trimmed" = Table.ExpandTableColumn(#"Grouped Rows", "Trimmed", {"Date", "Product", "Amount"})
    in
        #"Expanded Trimmed"
    

    Results from your posted data
    enter image description here