I have a table with thousands of rows that looks somewhat like this:
name | sub_text | sub_code | other_data |
---|---|---|---|
a | text1 | code1 | a_data |
a | text2 | code2 | a_data |
a | text3 | code3 | a_data |
a | text4 | code4 | a_data |
b | text1 | code1 | b_data |
b | text2 | code2 | b_data |
c | c_data |
These records represent people and some of them have 4 or 2 rows where only the sub_text
and the sub_code
values differ. There are tens of other_data
columns (person's ID, birth date and place, etc) that are the same for those 4 or 2 rows. There are also rows that don't have any duplicates in which case the sub_text
and sub_code
values are empty.
I would like to convert this table into something like this:
name | sub_text1 | sub_text2 | sub_text3 | sub_text4 | sub_code1 | sub_code2 | sub_code3 | sub_code4 | other_data |
---|---|---|---|---|---|---|---|---|---|
a | text1 | text2 | text3 | text4 | code1 | code2 | code3 | code4 | a_data |
b | text1 | text2 | code1 | b_data | |||||
c | c_data |
I need to sort of "merge" the records so one person only has one record by adding the only differing values to new columns, while leaving the other_data
fields and the people who have only one record untouched.
This is originally an Excel table and I would also need the "converted" table in Excel. I don't think Excel can do something like that so I created a database (I only have it in Access now) hoping I could do more with SQL. I can't figure out how to solve this though. Any ideas?
In Power Query, available in Windows Excel 2010+ and Office 365, you can do this fairly easily by
name
and other_data
in your example)To use Power Query
Data => Get&Transform => from Table/Range
Home => Advanced Editor
Applied Steps
to understand the algorithmM-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
//Group by Name and Other Data
// extract a delimited list of the sub_text and sub_codes
#"Grouped Rows" = Table.Group(Source, {"name", "other_data"}, {
{"sub_text", each Text.Combine([sub_text],";"), type text},
{"sub_code", each Text.Combine([sub_code],";"), type text}
}),
//split the delimited lists into separate columns.
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "sub_text",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv)),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "sub_code",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv))
in
#"Split Column by Delimiter1"