I'm working with a CSV export of identity data containing ~22000 records. One of the fields included is titled 'ExtendedAttributes' and each cell in the column contains a quote bound string of comma separated Key:Value pairs. Each record in the file has an arbitrary number of extended attributes (up to around 50). My ultimate objective is to expand these extended attributes into their own columns in Excel (2016). I already have solutions for the expansions into columns from other data using formulae, simple VBA and most recently Power Query based approaches.
However, my previous solutions have all been based on the Key:Value pairs being simple to delimit. In this export, the ExtendedAttributes field has:
Value data that may contain unescaped/unquoted commas. e.g.
"Key1: Value1, name: surname, forename, Key2: Value2, ... "
Keys that may contain multiple comma separated values, which are also unquoted/unescaped. e.g.
"Key1: Value1, emailAlias: alias1@domain, alias2@domain, alias3@domain, Key2: Value2, ... "
My usual approach to this, where Key:Value pairs don't have these problems would be to delimit using commas to break it into the key value pairs, transpose the data into rows, then delimit using the colon to populate my new columns and their values as described here in the PowerBI community pages
This doesn't work here because delimiting using a comma breaks the values.
Is there a straightforward way to parse this into the constituent Key:Value pairs using (ideally) Power Query? Happy to also go with VBA or formula based solutions.
My instinctive approach would be to try and identify substrings containing a colon and prepend them with a unique character, which can then be used as a delimiter. (It's not impossible that the data may also include unescaped colons, but I'm happy to assume that it doesn't) But recognise that this may be a needlessly complex approach and I'm unsure how best to do it.
I'm happy to keep values with multiple comma separated items as a single unit (A problem for me to deal with later).
For the example data:
"Key1: Value1, name: surname, forename, emailAlias: alias1@domain, alias2@domain, alias3@domain, Key2: Value2, ... "
I'd like to end up with something that lets me treat the data like this, using maybe a ! as an example unique character that I could then use as a delimiter:
"Key1: Value1!name: surname, forename!emailAlias: alias1@domain, alias2@domain, alias3@domain!Key2: Value2!..."
I don't have access to the original data (vendor controlled system) and have limited data processing tools on my corporate desktop (Excel 2016, VBA, PQ). Appreciate any help.
In Power Query, you can define a function Partition
as follows:
let
Output = (str as text, sep as text) as text =>
Text.RemoveRange(
Text.Replace(
Text.Combine(
List.Transform(
Text.Split(str, " "),
each if Text.Contains(_, ":") then sep & _ else _
),
" "
), ", " & sep, sep
),
0, Text.Length(sep)
)
in
Output
Example text transformation using separator !
Starting text:
Key1: Value1, name: surname, forename, emailAlias: alias1@domain
Key1:
Value1,
name:
surname,
forename,
emailAlias:
alias1@domain
:
with separator !
!Key1:
Value1,
!name:
surname,
forename,
!emailAlias:
alias1@domain
!Key1: Value1, !name: surname, forename, !emailAlias: alias1@domain
, !
with !
!Key1: Value1!name: surname, forename!emailAlias: alias1@domain
Key1: Value1!name: surname, forename!emailAlias: alias1@domain
Once you have this function defined, you can call it in a column transformation that would look something like
= Table.TransformColumns(#"Prev Step", {{"ColName", each Partition(_,"!") , type text}})