Search code examples
powerbipowerquerydata-cleaning

How to clean location data in power bi


I've currently got two tables. I have one table with a list of locations as such:

Zagreb (Croatia)

Seattle, WA, USA

New York City, NY

Kazakhstan, Almaty

I also have a master list of 200k cities that looks as such:

Zagreb | Croatia

Seattle | USA

New York City | USA

Almaty | Kazakhstan

The output I want is to add a new column to the first table as below:

Zagreb (Croatia) | Croatia

Seattle, WA, USA | USA

New York City, NY | USA

Kazakhstan, Almaty | Kazakhstan

This updated from a live source that I can't control the data quality from so any solution must be dynamic.

Any ideas appreciated!


Solution

  • One possible approach would be to add a custom column to the first table that searches the string for any cities that appear in the second table City column.

     = Table.AddColumn(#"Changed Type", "City",
           (L) => List.Select(Cities[City], each Text.Contains(L[Location], _)))
    

    This gives a list of matching cities. Expand that list to get the following:

    Cities

    You can then merge with the Cities table (matching on the City columns from each table) to pull over the Country column.


    Here's the full text of my query from the advanced editor:

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WikpML0pNUtBwLspPLMlM1FSK1YlWCk5NLCnJSdVRCHfUUQgNdgQL+qWWK0TmF2UrOGeWVOoo+EWCRb0TqxKzM4pLEvN0FBxzchNLKpViYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Location = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type", "City", (L) => List.Select(Cities[City], each Text.Contains(L[Location], _))),
        #"Expanded City" = Table.ExpandListColumn(#"Added Custom", "City"),
        #"Merged Queries" = Table.NestedJoin(#"Expanded City",{"City"},Cities,{"City"},"Cities",JoinKind.LeftOuter),
        #"Expanded Cities" = Table.ExpandTableColumn(#"Merged Queries", "Cities", {"Country"}, {"Country"})
    in
        #"Expanded Cities"