JOB_ID | DAY_ID |
---|---|
430 | 120 |
430 | 720 |
530 | 120 |
530 | 620 |
630 | 120 |
630 | 120 |
630 | 620 |
630 | 620 |
Should become:
JOB_ID | DAY_ID |
---|---|
430 | 720 |
530 | 620 |
630 | 620 |
630 | 620 |
Where only the rows where day_id is the maximum value is kept.
I'm currently using Excel's Power Query, so Python and R scripts are not available.
let
//Change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"JOB_ID", Int64.Type}, {"DAY_ID", Int64.Type}}),
//Group by JOB_ID
// Filter each subtable by maximum DAY_ID
#"Grouped Rows" = Table.Group(#"Changed Type",
{"JOB_ID"}, {
{"DAY_ID", (t)=>Table.SelectRows(t, each [DAY_ID]=List.Max(t[DAY_ID])),
type table [JOB_ID=Int64.Type, DAY_ID=Int64.Type]}}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"JOB_ID"}),
#"Expanded DAY_ID" = Table.ExpandTableColumn(#"Removed Columns", "DAY_ID", {"JOB_ID", "DAY_ID"})
in
#"Expanded DAY_ID"