Search code examples
excelpowerquerym

Generate a tag list in power query


I have a long list of article texts in ArticleList (1000+ items, with 50+ words each)

And I have a list of tags in TagList (100 items, 1 word each)

I want to return a list of matching tags (from TagList) for each word in ArticleList.

e.g.

ArticleList = {"the big yellow teapot"; "the small white cup"; "the medium brown pan"; "grey flask"}

TagList = {"the", "yellow", "teapot", "white", "pan"}

ResultsList = {"the, yellow, teapot"; "the, white"; "the, pan"; ""}

So basically any words not in TagList are removed from ArticleList and the ResultsList is left in the same order as original ArticleList.

NB: ArticleList is clean and comprises space separated words and case is not important.

I think I got someway towards a solution by splitting ArticleList into a List of Lists on the space delimiter, but I could not figure how to manipulate the Nested Lists from that point. Tried using List.Transform(ListofLists, each List.Intersect({_,TagList}){0}?), but it returns a single list instead of keeping the List of Lists integrity.


Solution

  • If I have a list called ArticleList, created with this M code:

    let
        Source = {"the big yellow teapot", "the small white cup", "the medium brown pan", "grey flask"}
    in
        Source
    

    enter image description here

    And a list called TagList, created with this M code:

    let
        Source = {"the", "yellow", "teapot", "white", "pan"}
    in
        Source
    

    enter image description here

    Then I can use this M code to do what you want:

    let
        Source = List.Transform(List.Transform(ArticleList, each Text.Split(_," ")), each List.Intersect({_,TagList})),
        #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
        #"Extracted Values" = Table.TransformColumns(#"Converted to Table", {"Column1", each Text.Combine(List.Transform(_, Text.From), ", "), type text})
    in
        #"Extracted Values"
    

    enter image description here