Search code examples
google-sheetsgoogle-analyticsdata-analysisfirst-normal-form

How do you deal with multiple values in a single cell separated by comma?


Suppose you have a table that is setup like:

Student Fruits
Foo Apple,Banana
Bar Orange
Baz Lemon,Orange

What is your usual approach if you need to quantify/do analysis on Fruits?

One approach I've used is to create multiple helper columns, then split the values like so:

Student Fruits fruit1 fruit2
Foo Apple,Banana Apple Banana
Bar Orange Orange
Baz Lemon,Orange Lemon Orange

But I'm not sure if this is the best approach, as this leads to having both horizontally and vertically entered data. So if I need to, say find out the fruits for 'Foo' in a conventional way, I'd have to again restructure this so that each fruit is in its own row to make it "spreadsheet readable". It also doesn't translate too well if I need to create pivot tables.

Is there a better approach for this kind of task?


Solution

  • The data should preferably be normalized to the first-normal-form:

    Student Fruits
    Foo Apple
    Foo Banana
    Bar Orange
    Baz Lemon
    Baz Orange

    You can try formulas like the one below to make the transformation.

    =QUERY(
      WRAPROWS(
       FLATTEN(
        MAP(
         A1:A4,
         B1:B4,
         LAMBDA(a,b,
          TOROW(
           REDUCE(,SPLIT(b,","),
            LAMBDA(ac,cur,
              VSTACK(ac,{a,cur}))))))),2),
     "where Col1<>''")