Search code examples
excelpivot-table

Format Multienumeration Data for Pivot Table


I'm relatively new to excel pivot tables, and I'm trying to figure out how exactly to format my raw data dump from an external application, so that when I push the data into excel for analysis, I can build different pivot tables/graphs from the same dataset, or minimize the amount of formatting I need to perform everytime this data dump occurs.

Currently, I have a list of items (each represented by a row) that have multiple enumerated values for certain columns. For instance, the color column could be "red" or "red, blue, black", and an animal column could be "dog" or "dog, parrot".

Element ID Color Animal State
1 Red, Blue, Black Dog, Parrot KS
2 Red, Black Dog, Cat NY
3 Blue Parrot KS

And I'm trying to create a pivot table that has the different animals as the rows, the different colors as the columns, and the values being the counts of products (rows in my original dataset) which have that color/animal combination.

Red Blue Black
Dog 2 1 2
Cat 1 0 1
Parrot 1 2 1

As far as I can tell from my research, I don't think there's a way to create the above pivot data with my raw data formatted the way I have above, regardless of the delimiter I use, but please share if this is incorrect.

I understand that one way to format the data to allow me to perform this analysis is to split the multiple enumeration columns into separate lines, say three distinct rows/entries in my raw data for the "red, blue, black" enumerations in the color column, and then do the same thing with all other multi-enumerated columns.

Element ID Color Animal
1 Red Dog
1 Blue Dog
1 Black Dog
1 Red Parrot
1 Blue Parrot
1 Black Parrot

The problem with that is that then when I use the same raw dataset to generate pivot tables for counting properties other than color (say State vs Animal), the same item that has now been split into multiple rows and is now counted multiple times, which I do not want.

The other solution I considered is to take the single set of raw data I have, and create different versions of it on different sheets that are then ingested by different pivot tables. So for instance, when I'm counting against the color property I'll use a table that splits my "red, blue, black" entry out into three distinct rows, but when I'm not counting against color, I'll use a table on a different sheet that only has one row.

I know that I can create the analysis pivot table above without pivot tables using a matrix of countifs formulas, but I'm trying to stick with pivot tables if possible, as they're just a lot more accessible and easier to manage.

I feel like I'm missing something with pivot tables, as I'm guessing there's a much more elegant solution that I'm just not aware of. Any help would be appreciated.


Solution

  • Using Power Query we could keep only one source (data dump table) along with a mapped second table that is automatically updated:

    1. Import the first data dump into Power Query - in the example I've imported it from Table1 (after importing into Excel and creating table)

    2. Create a new query by referencing this table (so any updates to Table1 will be automatically reflected in the new one.)

    3. Enter following in the Advanced Editor to split and trim the columns

    let
      Source = Table1,
      removedState = Table.RemoveColumns(Source, {"State"}),
      splitColorIntoRows = Table.ExpandListColumn(
        Table.TransformColumns(
          Table.TransformColumnTypes(removedState, {{"Color", type text}}),
          {
            {
              "Color",
              Splitter.SplitTextByDelimiter(","),
              let
                itemType = (type nullable text) meta [Serialized.Text = true]
              in
                type {itemType}
            }
          }
        ),
        "Color"
      ),
      splitAnimalIntoRows = Table.ExpandListColumn(
        Table.TransformColumns(
          Table.TransformColumnTypes(splitColorIntoRows, {{"Animal", type text}}),
          {
            {
              "Animal",
              Splitter.SplitTextByDelimiter(","),
              let
                itemType = (type nullable text) meta [Serialized.Text = true]
              in
                type {itemType}
            }
          }
        ),
        "Animal"
      ),
      trimmedCols = Table.TransformColumns(
        splitAnimalIntoRows,
        {
          {"Color", each Text.Trim(_), type nullable text},
          {"Animal", each Text.Trim(_), type nullable text}
        }
      )
    in
      trimmedCols
    
    1. Rename the table, for example tbColorAndAnimal
    2. Close and load into Excel

    Power Query Editor

    Use this table as the basis for pivot table for Color and Animal and use the first one for other summaries.

    Whenever the data changes, click on Refresh under Data tab to refresh results from Power Query and pivot tables:

    Table from Power Query and Pivot Table based on it