Search code examples
google-sheetsgoogle-sheets-formulapivot-tableunpivot

How to unpivot data that has multiple column groups in Google Sheets?


How do I transpose and transform this table:

Receipt date Amount

Child 1

Shirt size Age

Child 2

Shirt size Age

Child 3

Shirt size Age
5/1/2023 $39.00 John S 11-12 Mary L 14 Peter S 9
5/5/2023 (missing) Lilly M 14
(no receipt yet) $38.00 Tracy L 8 ½ Bill M 13 Jill M 11
Feb 17 Friday 2023 ₿0.00045 Michael S 10
5/8/2023 $15.00 Terry L 15

...into this desired stacked output?

Receipt date Amount

Label

Child

Shirt size Age
5/1/2023 $39.00 Child 1 John S 11-12
5/1/2023 $39.00 Child 2 Mary L 14
5/1/2023 $39.00 Child 3 Peter S 9
5/5/2023 (missing) Child 1 Lilly M 14
(no receipt yet) $38.00 Child 1 Tracy L 8 ½
(no receipt yet) $38.00 Child 2 Bill M 13
(no receipt yet) $38.00 Child 3 Jill M 11
Feb 17 Friday 2023 ₿0.00045 Child 1 Michael S 10
5/8/2023 $15.00 Child 1 Terry L 15

Note that source data columns C2:K are in groups of three, and that I need the labels Child 1, Child 2 and Child 3 be present in the result.

Existing questions such as How do you create a "reverse pivot" in Google Sheets only deal with source data where all unpivoted columns are treated the same. That does not work with the data above because its columns should be managed in groups of three, and each group of three values should be unpivoted together.

Further, many answers to reverse pivot questions use text string manipulation and split(), which does not work with my data because it can mistreat dates, Booleans and numbers in certain formats, together with text strings that look like those types. For example, split() will coerce the text string 11-12 in the data above to the numeric date 12 November 2023. The same thing happens with text strings like 1 2, 1 2 3 and 1111-2-3. Conversely, numbers and dates in custom number formats like ₿0.00045 and Feb 17 Friday 2023 get coerced into text strings.

Some existing answers stack data by using query(), but that function only accepts one data type per column. If a column contains a mix of text and numbers, or the occasional date or Boolean, the majority type will rule, and other types will be returned as nulls. The data above will miss several values if processed with query().

There are some answers that avoid the above pitfalls by using a custom function, but custom functions suffer from the "Google Sheets custom functions stuck in loading" issue, and will not work well when the spreadsheet is read with importrange() or published via File > Share > Publish to web.

I have several data tables like the one above, and they have column groups of different sizes and a different number of fixed columns each. An optimal solution would be easy to configure without modifying the formula logic.

How do I reverse pivot and transform the data above into the desired format without getting it mangled, with an easy-to-configure formula that does not use a custom function?


Solution

  • To unpivot the data in columns A1:K with 2 fixed columns in the left, followed by a Label column and a group of 3 unpivoted columns, use this pattern:

    =let( 
    
      data, A1:K, 
      numColsToRepeat, 2, 
      numColsPerGroup, 3, 
    
      subRows_, lambda(t, o, h, chooserows(t, sequence(h, 1, o))), 
      subCols_, lambda(t, o, w, choosecols(t, sequence(1, w, o))), 
      subTable_, lambda( 
        rowIndex, colIndex, height, width, 
        subRows_(subCols_(data, colIndex, width), rowIndex, height) 
      ), 
      label_, lambda(colIndex, subTable_(1, colIndex, 1, 1)), 
      headers, hstack( 
        subTable_(1, 1, 1, numColsToRepeat), 
        "Label", 
        subTable_(1, numColsToRepeat + 1, 1, numColsPerGroup) 
      ), 
      numGroups, (columns(data) - numColsToRepeat) / numColsPerGroup, 
      groupIndices, sequence(1, numGroups, numColsToRepeat + 1, numColsPerGroup), 
      nullRow, tocol(iferror(ø), true), 
      unpivot, reduce( 
        headers, sequence(rows(data) - 1, 1, 2), 
        lambda( 
          result, rowIndex, 
          let( 
            firstCols, subTable_(rowIndex, 1, 1, numColsToRepeat), 
            stackResult, reduce( 
              nullRow, groupIndices, 
              lambda(rowStack, colIndex, 
                let( 
                  group, subTable_(rowIndex, colIndex, 1, numColsPerGroup), 
                  if( 
                    single(group) = "", 
                    rowStack, 
                    vstack(rowStack, hstack(firstCols, label_(colIndex), group)) 
                  ) 
                ) 
              ) 
            ), 
            vstack(result, stackResult)  
          ) 
        ) 
      ), 
      unpivot 
    )
    

    To apply the formula to a data table that has different dimensions, modify the data, numColsToRepeat and numColsPerGroup parameters. The formula will work when data is a spreadsheet range and also when it is a computed array.

    The formula will weed out records where a value is missing in the first column of a column group. If you want to keep those records, replace the if() with its inner vstack().

    For unbeknownst reasons, some people prefer "one-liner" versions of formulas that look like line noise from a modem in days of yore. Those formulas are none the "more efficient", but if you want an unreadable version, you can use this:

    =let(d,A1:K,f,2,g,3,r_,lambda(t,o,h,chooserows(t,sequence(h,1,o))),c_,lambda(t,o,w,choosecols(t,sequence(1,w,o))),s_,lambda(r,c,h,w,r_(c_(d,c,w),r,h)),l_,lambda(c,s_(1,c,1,1)),hr,hstack(s_(1,1,1,f),"Label",s_(1,f+1,1,g)),i,sequence(1,(columns(d)-f)/g,f+1,g),n,tocol(iferror(ø),true),reduce(hr,sequence(rows(d)-1,1,2),lambda(z,r,let(e,s_(r,1,1,f),s,reduce(n,i,lambda(t,c,let(x,s_(r,c,1,g),if(single(x)="",t,vstack(t,hstack(e,l_(c),x)))))),vstack(z,s)))))

    See let(), offset() and reduce().