Search code examples
excelsortingcolumnsorting

Sorting column data in Excel when one cell has multiple values seperated by a comma


I have two columns to look at here. In each corresponding cell, their are multiple values all seperated by a comma. Here's how it looks:

Column A (States) Column B (Dates)
(Cell A2): Ohio, Georgia, Alabama (Cell B2): 06-01-2021, 04-01-2021, 10-01-2021
(Cell A3): Alaska, Utah, California (Cell B3): 11-01-2021, 12-01-2021, 06-01-2021

So you can see that a single cell can have multiple sets of data that is separated by a comma. The state relates to the date, so in the cell A2 example: Ohio is 06-01-2021, Georgia is 04-01-2021, and Alabama is 10-01-2021.

The issue here is that I am needing the dates sorted from oldest to newest, and have the states adjust with that change. I have tried sorting, but since one cell contains multiple values that doesn't seem to work.

The reason the workbook is in this format is because I used a group_concat query in MySQL and now I am needing analysis here. I have tried for hours here and I'm stuck. Any ideas?

Expected output for Cell A2 and B2 would be this:

Column A (States) Column B (Dates)
(Cell A2): Georgia, Ohio, Alabama (Cell B2): 04-01-2021, 06-01-2021, 10-01-2021

The example above is correct because the dates are in the oldest to newest order, and the states adjusted with that as well. I have 30k rows of data here.


Solution

  • If you have Windows Office 365, you can use formulas,

    • FILTERXML converts the strings into arrays
    • SORT is used to sort the Dates in order
    • SORTBY is used to sort the States in the sorted date order

    If you have Office 365 but on a Mac, you won't have the FILTERXML function, but there are other ways to create the necessary array from the comma separated list

    Sorted States

    =TEXTJOIN(", ",,SORTBY(FILTERXML("<t><s>" & SUBSTITUTE(A2,",","</s><s>")& "</s></t>","//s"),(FILTERXML("<t><s>" & SUBSTITUTE(B2,",","</s><s>")& "</s></t>","//s"))))
    

    Sorted Dates

    =TEXTJOIN(", ",,TEXT(SORT(FILTERXML("<t><s>" & SUBSTITUTE(B2,",","</s><s>")& "</s></t>","//s")),"m/d/yyyy"))
    

    enter image description here

    You can also do this in Power Query, available in Windows Excel 2010+ and Office 365

    To use Power Query

    • Select some cell in your Data Table
    • Data => Get&Transform => from Table/Range
    • When the PQ Editor opens: Home => Advanced Editor
    • Make note of the Table Name in Line 2
    • Paste the M Code below in place of what you see
    • Change the Table name in line 2 back to what was generated originally.
    • Read the comments and explore the Applied Steps to understand the algorithm

    M Code

    let
        Source = Excel.CurrentWorkbook(){[Name="Table14"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"States", type text}, {"Dates", type text}}),
    
    /*Sort the string of Dates 
        Split the string into a List and trim each element
        Transform each list element into a "real date"
        Sort the list of dates
        Convert each date into a text string with the same format as original column
        Combine the list of dates back into a comma separated string
    */
        #"Added Custom" = Table.AddColumn(#"Changed Type", "Sorted Dates", each Text.Combine(
            List.Transform(
                List.Sort(
                    List.Transform(
                        List.Transform(
                            Text.Split([Dates],","), each Text.Trim(_)),
                        each Date.FromText(_))
                    ), each Date.ToText(_,"M/d/yyyy")),
                        ", "), Text.Type),
    
    /*Sort the string of states
        Split the string of states, string of dates, and string of sorted dates into a trimmed list
        Find the Position of each Date (from sorted List) in the Unsorted List
        Use that position as an Index into the unsorted State list to create a sorted state list
        Combine the sorted state list into a comma separated string
    */
        #"Added Custom1" = Table.AddColumn(#"Added Custom", "Sorted States", each 
            let 
                dts = List.Transform(Text.Split([Dates],","),each Text.Trim(_)),
                srtDts = List.Transform(Text.Split([Sorted Dates],","), each Text.Trim(_)),
                sts = List.Transform(Text.Split([States],","),each Text.Trim(_)),
                
                states = List.Generate(
                    ()=> [state=sts{List.PositionOf(dts,srtDts{0})}, idx=0],
                    each [idx] < List.Count(dts),
                    each [state = sts{List.PositionOf(dts,srtDts{[idx]+1})}, idx = [idx]+1],
                    each [state])
            in 
                Text.Combine(states,", "), Text.Type),
    
    //Remove unneeded column and reorder the date/state columns        
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"States", "Dates"}),
        #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Sorted States", "Sorted Dates"})
    in
        #"Reordered Columns"
    

    Original
    enter image description here

    Sorted
    enter image description here