Search code examples
powerbipowerquerypowerbi-desktoppowerbi-datasource

Power Query: Replace Value based on other column


I have a dataset of Degree Programs and Colleges that I am working with, but several colleges have had name changes in recent years.

+------+------------------------------+------------------+
| Year |           College            |     Program      |
+------+------------------------------+------------------+
| 2018 | College of Arts & Humanities | BA Communication |
| 2017 | College of Arts & Humanities | BA Communication |
| 2016 | College of Arts & Sciences   | BA Communication |
+------+------------------------------+------------------+

What I want to do is replace the college name on each reference of the program with the college name from the latest year. In my example above, it would only change 2016 to "College of Arts & Humanities."

I'm still very new to power query and the like, so I haven't the slightest on how to handle this...

I can create a new column with the information, or replace text on the column, I just need a way that isn't manual.

Has anyone else ran into this? Or have an idea on how to approach it? Any advice would be greatly appreciated. :)


Update: Replace values won't work, as some colleges were split and there's not a simple A-B transition.

When I say "several" colleges, 1 college split into two, and 2 others were renamed. We're talking about several thousand records in one dataset, all different colleges. I don't have any sort of a reference table for the name changes. I've been working on this dataset for 2 days, and the easiest logic I can find to make this work is to replace the college name on each program with the college name from the latest year.


Final Update: Thanks to @user9264230. After updating the references in the provided code, it looks like that took care of the issue!


Solution

  • You need a key field that distinguishes one college from another. If it is field Program then use code

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Grouped Rows" = Table.Group(Source, {"Program"}, {{"Year", each List.Max([Year]), type number}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows",{"Program", "Year"},Source,{"Program", "Year"},"Source",JoinKind.LeftOuter),
    List = Table.ExpandTableColumn(#"Merged Queries", "Source", {"College"}, {"College"}),
    #"Merged Queries1" = Table.NestedJoin(Source,{"Program"},List,{"Program"},"List",JoinKind.LeftOuter),
    #"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries1", "List", {"College"}, {"College.1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table2",{"College"})
    in #"Removed Columns"
    

    Otherwise replace all instances of Program with name of desired field such as Key