I have two columns to look at here. In each corresponding cell, their are multiple values all seperated by a comma. Here's how it looks:
Column A (States) | Column B (Dates) |
---|---|
(Cell A2): Ohio, Georgia, Alabama | (Cell B2): 06-01-2021, 04-01-2021, 10-01-2021 |
(Cell A3): Alaska, Utah, California | (Cell B3): 11-01-2021, 12-01-2021, 06-01-2021 |
So you can see that a single cell can have multiple sets of data that is separated by a comma. The state relates to the date, so in the cell A2 example: Ohio is 06-01-2021, Georgia is 04-01-2021, and Alabama is 10-01-2021.
The issue here is that I am needing the dates sorted from oldest to newest, and have the states adjust with that change. I have tried sorting, but since one cell contains multiple values that doesn't seem to work.
The reason the workbook is in this format is because I used a group_concat query in MySQL and now I am needing analysis here. I have tried for hours here and I'm stuck. Any ideas?
Expected output for Cell A2 and B2 would be this:
Column A (States) | Column B (Dates) |
---|---|
(Cell A2): Georgia, Ohio, Alabama | (Cell B2): 04-01-2021, 06-01-2021, 10-01-2021 |
The example above is correct because the dates are in the oldest to newest order, and the states adjusted with that as well. I have 30k rows of data here.
If you have Windows Office 365, you can use formulas,
FILTERXML
converts the strings into arraysSORT
is used to sort the Dates in orderSORTBY
is used to sort the States in the sorted date orderIf you have Office 365 but on a Mac, you won't have the FILTERXML
function, but there are other ways to create the necessary array from the comma separated list
Sorted States
=TEXTJOIN(", ",,SORTBY(FILTERXML("<t><s>" & SUBSTITUTE(A2,",","</s><s>")& "</s></t>","//s"),(FILTERXML("<t><s>" & SUBSTITUTE(B2,",","</s><s>")& "</s></t>","//s"))))
Sorted Dates
=TEXTJOIN(", ",,TEXT(SORT(FILTERXML("<t><s>" & SUBSTITUTE(B2,",","</s><s>")& "</s></t>","//s")),"m/d/yyyy"))
You can also do this in Power Query, available in Windows Excel 2010+ and Office 365
To use Power Query
Data => Get&Transform => from Table/Range
Home => Advanced Editor
Applied Steps
to understand the algorithmM Code
let
Source = Excel.CurrentWorkbook(){[Name="Table14"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"States", type text}, {"Dates", type text}}),
/*Sort the string of Dates
Split the string into a List and trim each element
Transform each list element into a "real date"
Sort the list of dates
Convert each date into a text string with the same format as original column
Combine the list of dates back into a comma separated string
*/
#"Added Custom" = Table.AddColumn(#"Changed Type", "Sorted Dates", each Text.Combine(
List.Transform(
List.Sort(
List.Transform(
List.Transform(
Text.Split([Dates],","), each Text.Trim(_)),
each Date.FromText(_))
), each Date.ToText(_,"M/d/yyyy")),
", "), Text.Type),
/*Sort the string of states
Split the string of states, string of dates, and string of sorted dates into a trimmed list
Find the Position of each Date (from sorted List) in the Unsorted List
Use that position as an Index into the unsorted State list to create a sorted state list
Combine the sorted state list into a comma separated string
*/
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Sorted States", each
let
dts = List.Transform(Text.Split([Dates],","),each Text.Trim(_)),
srtDts = List.Transform(Text.Split([Sorted Dates],","), each Text.Trim(_)),
sts = List.Transform(Text.Split([States],","),each Text.Trim(_)),
states = List.Generate(
()=> [state=sts{List.PositionOf(dts,srtDts{0})}, idx=0],
each [idx] < List.Count(dts),
each [state = sts{List.PositionOf(dts,srtDts{[idx]+1})}, idx = [idx]+1],
each [state])
in
Text.Combine(states,", "), Text.Type),
//Remove unneeded column and reorder the date/state columns
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"States", "Dates"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Sorted States", "Sorted Dates"})
in
#"Reordered Columns"